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1. Introduction 

The constituent monomers of protein-type hetero-polymers, the amino-acids of which 
there exist about twenty in nature, are composed of a common backbone and a 
differentiating side chain, and are bound via a peptide bond. These units are connected 
sequentially to form a polypeptide chain. The sequence of connected amino-acids 
defines the so called 'primary structure' of the chain. Given the primary structure, 
the mechanical degrees of freedom of the polypeptide chain are rotation angles at the 
junctions of adjacent amino-acids. They allow proteins to fold into relatively simple 
repetitive local arrangements (the 'secondary structures', such as a-helices or /3-sheets) 
which then combine into more complicated global arrangements in 3D (the 'tertiary 
structure'). The folding process is controlled by various combinations of forces, such 
as those induced by mutual interactions between the amino-acid side chains (steric 
forces, Van der Waals forces), by interactions between side-chains and the polymer's 
backbone (hydrogen and sulphur bonds), and by interactions between the amino- 
acid side-chains and the surrounding solvent (polarity induced forces and hydrogen 
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bonds). For comprehensive reviews on the physics of the interactions governing the 
folding of proteins see e.g. [TJ [2]. Apart from 'chaperone' effects (the influence of 
specialized proteins), it was discovered [3] that the dynamics of the folding process 
is for most proteins determined solely by their primary structure. Since polypeptide 
chains can vary in length from a few tens to tens of thousands of monomers, there 
is an enormous number of possible sequences. Yet only a tiny fraction of these (the 
actual biologically functional proteins) will represent chains that fold into a unique 
reproducible tertiary structure, or three-dimensional 'conformation', which determines 
its biological function. 

The protein folding problem is how to predict this conformation (the native state) 
of a protein, given its primary structure. It remains one of the most challenging 
unsolved problems in biology. Its solution would have a big impact on medicine. 
The physicist's strategy in this field (as opposed to bio- informatics approaches based 
on simulation, see e.g. [1] for a recent review) is to try to understand the main 
physical mechanisms that drive the one-to-one correspondence between amino-acid 
sequence and the native state. Normally this is attempted via simple quantitative 
mathematical models that capture the essential phenomenology of folding and lend 
themselves to statistical mechanical analysis El [7] and/or are easily simulated 
numerically [H 02 [10l HJJ- In the language of thermodynamics and statistical 
mechanics, it is believed that if a protein spontaneously reaches its native state at 
physiological conditions of temperature and pressure, its free energy landscape must 
possess a unique stable minimum [12]. However, calculating free energy landscapes 
for biologically functional proteins is non-trivial, because of the frustration induced 
by the local steric constraints in combination with the effective interactions via 
polarity and hydrogen bonds, especially in view of the heterogeneity of the amino- 
acid sequences sequences. In addition we would like to understand the folding pathway 
that ensures a proteins fast approach to its native state in physiological conditions, by 
avoiding kinetic traps and minimizing the various potential frustration effects [13\ 1 14] . 
Random amino-acid sequences do not fold into unique conformations, i.e. they have 
more complicated multi- valley free-energy landscapes, so one concludes that those 
sequences that correspond to proteins have been selected genetically on the basis of 
their associated free energy landscapes [HI [16] . 

There is little consensus yet as to what is the main driving force in the folding 
process. Some believe the hydrophobic-hydrophilic effect (i.e. hydrophobic side-chains 
try to avoid contact with the solvent, while hydrophilic side-chains seek to be in contact 
with it) to be the dominant factor in secondary and tertiary structure formation 
[171 fT8l [TBI [TBI ' w it n steric constraints enforcing further microscopic specificity, and 
hydrogen bonds providing a locking mechanism [19] . Others believe the folding to be 
mainly driven by the formation of intra-molecular (or peptidic) hydrogen bonds on 
top of hydrogen bonding between side-chains and the solvent [20]. Most physicists' 
studies either resort to models similar to self-avoiding walks on regular lattices [UJ 122 
(usually via graph-counting and numerical simulations) , or focus on generic properties 
of (free) energy landscapes [23[ [24l [25] , or try to exploit the one-dimensional nature of 
the poly-peptide chains [26l [27l [28] . In either case, in virtually all studies the amino- 
acid sequences are regarded as frozen disorder, over which appropriate averages are 
calculated (in statics of the free energy per monomer, in dynamics of the moment- 
generating dynamical functional). This implies that the sequences at hand must be 
'typical' within an appropriate ensemble of sequences, which presents us with a serious 
fundamental problem. Amino-acid sequences of proteins are far from random: they 
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have been carefully selected during evolution on the basis of their functionality and 
their ability to lead to reproducible folds. Thus one either has to define an ensemble of 
amino-acid sequences on the basis of the known primary sequences of real proteins that 
are being collected in biological databases, which removes the possibility to carry out 
disorder averages in the mathematical theory analytically, or one has to find a way 
to capture the essence of the observed biological sequences (as opposed to random 
ones) in simple mathematical formulae. Although some analytical studies did involve 
non-random sequences, the sequence statistics were usually not connected to folding 
quality as such [SjJl 130] • 

There is an alternative strategy in the statistical mechanical modeling of 
interacting many-particle systems with non-random disorder, which was followed 
successfully in the past for e.g. neural networks (where the synaptic connections 
between neurons represent the disorder) [311 1221 ESI [34] and for a simple mean-field 
hetcro-polymer model [35 . Rather than averaging over all amino-acid sequences 
(subject perhaps to experimentally determined constraints), one combines the process 
of secondary structure generation (folding) with a slow evolutionary process for the 
amino-acid sequences (which represents the genetic selection of free energy landscapes) 
and one couples these two processes in a biologically acceptable way. One can then 
try to solve for the 'slow' process upon assuming adiabatic separation of the two 
time-scales, using the so-called finite-n replica theory. This results in solvable models 
describing structure generation in poly-peptide chains with amino-acid sequences that 
are no longer random, but selected in a manner that correlates with the folding process, 
without having been required to capture the sequence statistics in a formula. It is 
encouraging that we know from previous studies such as [311 E21 O O [35] that in 
such models the impact of the slow genetic process is indeed generally to drive the 
systems away from multi- valley energy landscapes towards single-valley ones. 

In the present paper we take the next step in this research programme. Whereas 
|35j involved a simplified model with only polarity-induced mean-field forces, here 
we develop a theory for the coupled dynamics of (fast) folding and (slow) sequence 
selection on the basis of the more precise Hamiltonian introduced in [55], which 
includes also short-range steric forces along the chain. At a technical level our problem 
requires the diagonalization of replicated transfer matrices, for which efficient methods 
have been developed only recently [36l [37l [33 [39] . We apply these diagonalization 
methods to the present model, within the ergodic (i.e. replica-symmetric, RS) ansatz, 
and show how they lead in the thermodynamic limit to closed equations for non-trivial 
order parameters. In the context of protein folding one expects the RS ansatz to be 
appropriate. In finite dimensional replica calculations replica symmetry is known to 
break down only for small values of the replica dimension n, i.e. at high genetic 
noise levels, whereas here our interest is mostly in the regime of low genetic noise 
levels. Second, given the robustness and reproducibility of proteins' secondary and 
tertiary structures one must assume these systems to operate in an ergodic regime. 
Third, at a mathematical level, our present order parameter equations will involve only 
quantities with a single replica index, giving yet another indication that RS should 
hold. After first recovering the solutions of the order parameter equations in various 
known limits, we focus on the biologically most realistic regime of sequence selection 
at zero genetic noise levels, viz. n — > oo, where we extract the non-trivial phase 
phenomenology and derive phase diagrams analytically. We find many interesting 
phase transitions, both continuous and discontinuous, and remanence effects, all of 
which can be understood and explained on physical grounds. This is followed by a 
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Figure 1. Illustration of the chemical and mechanical degrees of freedom in our 
model. At each site i of the chain we have a discrete variable which specifies the 
local amino-acid type, and a residue angle <f)i which defines its physical location 
relative to the one-dimensional polymer chain axis (the 'backbone', drawn as 
a dashed line). In this example the number of possible orientations of each 
residue is three. The black blobs represent locations occupied by residues. The 
primary structure of the polymer (its chemical composition) is thus defined 
by (Ai, . . . , Ajv), and the secondary structure by ((f>i , . . . , <Pn)- Both types of 
variables are assumed to evolve in time, although on widely separated time-scales. 



numerical analysis of the order parameter equations for nonzero genetic noise levels, 
and by tests of the theoretical predictions against numerical simulations of the coupled 
sequence selection and folding processes. Within the limitations imposed by finite 
size and finite size effects, we find an satisfactory agreement between our theoretical 
predictions and the numerical simulations. 

2. Model definitions 

2.1. The folding and sequence selection processes 

Our model inherits much of its initial features from [26j , and represents the amino-acid 
cores as nodes in a one dimensional chain. The global conformal state of the system 
is defined by N successive angles <f> = (<pi, . . . , 0at) € Sl N of amino-acid residues, 
relative to the chain's backbone. Here f2 = {0, 2n/q, 47r/g, . ,.,(q— l)2n/q} C [0, 2n), 
where q € IN. The simplified picture is that of residues being able to rotate (with 
constraints, and limited to q discrete positions) in a plane perpendicular to the chain's 
axis. The primary structure (the amino-acid sequence) is written as A = (Ai, . . . , Ajv), 
with Ai G {1,...,A} denoting the residue species at position i in the chain (with 
A = 20 for real proteins). See also figure Q] In contrast to [26], however, the primary 
sequence will here not be drawn at random, but will be generated by an appropriate 
genetic selection process; this improves the biological realism of the model, but will 
change and complicate the mathematics significantly. We will therefore only include 
monomer-solvent polarity forces and steric forces, leaving out hydrogen bonds for 
now. Furthermore, we refine the Hamiltonian used in [26j to take into account the 
effect of the polymer's overall polarity balance on its ability to exhibit predominantly 
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hydrophilic surface residues and hydrophobic core residues; for models with fixed 
primary sequences as in this would add an irrelevant constant to the energy, but 
for models such as the present where the monomer sequences evolve in time this energy 
contribution will exert sequence selection pressure with significant consequences. In 
many of our calculations we will also choose q — 2, i.e. limit the residue angles to 
fa €E {0,7r}. This prevents us from having to generalize the diagonalization methods 
of [36[ 137] . which would probably require a separate study in itself. Thus, for a given 
realization of the primary sequence A, the folding process is assumed to be governed 
by the following Hamiltonian: 

Ff(0|A) = ^Y^^X.^X,:^ .,, 

- J s ^cos[(0 i+ i- fa) - (<pi- - o(Aj)] (1) 

i 

£(A) £ IR measures the polarity of residue A (with £ > indicating hydrophobicity and 
£i < indicating hydrophilicity) . The first term in ([1]) favours conformations where 
hydrophobic and hydrophilic avoid identical orientations, since this makes it easier for 
the polymer to find a fold that shields its hydrophobic residues from the solvent while 
exposing its hydrophilic ones. The second term represents in a simplified manner the 
effects of steric forces, characterizing each residue A by a winding 'distortion' angle 
a(A) for successive residue rotations. If a(Aj) = 0, then residue i will prefer to have an 
angle fa such that torsion along the chain is homogeneous, i.e. fa+± — fa = fa— fa—\. 
The energies J p > J s > control the relative impact of each contribution. For a fixed 
sequence one can define the partition function Zf(\) and the free energy Ff (A) for the 
equilibrium state of the folding process at temperature Tf = /3~ (in units where the 
Boltzmann constant equals ks = 1): 

2f(A)= X>P[-W0W] (2) 

F f (A) = -/T'logiJfCA) (3) 

It will be convenient to characterize the relevant chemical characteristics of amino- 
acids by the distribution 

1 A 

wit, V) = X E S ^ £( A )]% - cos[a(A)]] (4) 
x=i 

As there is no obvious structural physical/chemical link between residue polarity 
and geometric (steric) properties, we assume statistical independence, i.e. w(£, rj) = 
w(£)w(r)) (this will also induce welcome simplifications later). Typical simple choices 
fartofc) would be to(0 = e<5(£) + i(l-e) [5($-l) + + 1)] oiwtf) = %6[1-{]0[1 + S]. 
Note that we may always choose the maximum polarity to be one, since alternative 
values can be absorbed into the definition of the parameter J p . For w(rf), natural 
choices would be 10(77) = 7r -1 J^da S[rj— cos(a)] = 7r _1 [l — arccos 2 (r?)] -1 / 2 #[l— rf\9[l+rj\ 
or w(n) — ^6[1 — r)]9[l + rj\. Here the allowed value range [—1, 1] is enforced by the 
physical meaning of 77. 

We now follow [35] and complement the folding process by an adiabatically 
slow stochastic evolutionary selection process for the amino-acid sequences. The 
assumption is that this selection results from an interplay between the demands that (i) 
a sequence must lead to a unique and easily reproducible equilibrium conformation for 
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its associated folding process, and (ii) the resulting structure is useful to the organism 
(e.g. it can act as a catalyst of some metabolic or proteomic cellular reaction). If 
one takes the further step to quantify the quality of an equilibrium conformation by 
the value of the folding free energy -Ff(A) (i.e. taking 'low free energy' as a proxy 
for 'more reproducible'), together with the direct energetic cost V(X) of not having 
strictly hydrophilic 'surface residues' and strictly hydrophobic 'core residues', and 
if one assumes that biological usefulness can be measured by some utility potential 
U (A) , then the evolutionary process can be viewed as the stochastic minimization of 
an effective Hamiltonian for amino-acid sequences that takes the form 



If the stochastic minimization is of the Glauber or Monte-Carlo type, the evolutionary 
process will evolve itself to a Boltzmann-type equilibrium state, namely Poo (A) oc 
exp[— /3H e g (A)], where f3 measures the (inverse) noise level in the genetic selection |]. 
Our combined model (fast folding and slow genetic sequence selection) is thus solved 
in equilibrium by calculating the associated effective free energy per monomer 



with the noise level ratio n = (3/ ft. As in [311 [32l [33l [3H [35], this expression can be 
evaluated via the replica formalism, where n is first taken to be integer and the result 
is subsequently continued to non-integer values. Note that in this type of model the 
replica dimension has a clear physical meaning as the ratio of temperatures. For n — > 
we recover the free energy of a system with quenched random amino-acid sequences, 
for n — 1 we have that of an annealed model, whereas for n — ► oo the sequence 
selection becomes strictly deterministic. In contrast to previous coupled dynamics 
studies, however, here we have not only mean-field forces but also short-range ones: 
the steric interactions in (p}. The replica calculation will therefore be quite different. 

In this paper we limit ourselves for mathematical convenience to sequence 
functionality potentials of the simple form U{X) — Similarly we choose 

the energetic penalty V(X) on hydrophobic surface residues or hydrophilic core 
residues to be a function only of the polarity balance A; (A) = ./V -1 J^i £(Aj), putting 
V(X) = J g Nv(k(X) — k*) with a function v(k) that is minimal for k — 0, where 
k* represents the 'optimal' polarity balance that would give a protein with strictly 
hydrophilic surface residues and strictly hydrophilic core residues (which one expects 
to be close to zero). This form for V(X) would emerge naturally if all amino-acids 
were to have similar values of |£(Aj)|. The implicit assumption is that if a polarity 
balance k(X) is energetically favourable, i.e. close to k* , then the protein will be 
able to find a fold that realizes the desired geometric separation of core versus surface 
residues. We will discuss the mathematical consequences of making alternative choices 
in the discussion section. Since for N — > oo chain boundary effects must vanish, we 

| Another way to see why Poo(A) oc exp[— PH a g (A)] is a natural evolutionary equilibrium state is 
to image having real- valued A, evolving according to a Langevin equation in which the deterministic 
force is minus the gradient of the energy Hf(X) + U(X) + V(X). Given adiabatic separation of folding 
and evolution time-scales, one can then integrate out the fast variables (the conformation angles) and 
find the Boltzmann state for the sequences A with effective Hamiltonian J5}. See e.g. 1351 for details. 



H eS (X) = U(X) + V(X) - /3- 1 logZ f (A) 



(5) 
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Figure 2. Diagrams showing each of the twenty amino-acids as a point in 
the plane, with the horizontal coordinate giving its polarity value (taken from 
|43] , and normalized to the range [—1,1]), and with the horizontal coordinate 
giving the cosine of an average conformation angle (averaged over all proteins of 
a given class), left: averages calculated for conformation angle <f>; right: averages 
calculated for conformation angle tp. Top row: averaging over all a-proteins for 
which structures are available; bottom row: averaging over all /3-proteins for which 
structures are available. All conformation data were extracted from [411 142| . 

also choose periodic boundary conditions and take N even (for mathematical reasons 
which will become clear later). 



2.2. Relation between model assumptions and biological reality 

Here we discuss some of the assumptions and definitions of our model in the light of 
experimental evidence from real proteins. Our choice for a single-angle representation 
of the mechanical degrees of freedom of a monomer was motivated by our desire 
to limit the mathematical complexity, although our methods would apply also if we 
were to work with the conventional two conformation angles (<p,tp). In fact, there is 
evidence [40j to suggest that the conventional two-angle representation is redundant, 
and that only one newly defined torsion angle is needed per amino-acid to specify a 
protein's conformation. If we insist on identifying the single-site degrees of freedom 
in our model with one of the standard conformation angles ((j),ip), we have to choose 
the one that matches our statistical assumptions best. To do this we have calculated 
for individual amino-acids the average of the observed conformation angles ((/>, ip) over 
all occurrences of this amino-acid in the data base of known protein structures (the 
SCOP data base [41], [42] ) , which resulted in the graphs of figure [2j where we plot the 
cosines of the average conformation angles of all twenty amino-acids together with 
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Figure 3. Diagrams showing all proteins in data base |42| . organized into the 
four main protein families, as points in the plane, with the horizontal coordinate 
giving their inverse size N -1 and with the vertical coordinate giving their average 
polarity value fc raw = JV" 1 . §i, raw along the chain (where the polarities £i, raw 
of the constituent residues are taken directly from [43] . without normalization to 
[—1,1]) Dashed horizontal lines indicate the average overall polarity level found 
within each protein class. 



their polarity value (according to the Eisenberg scale, taken from [43] , and normalized 
linearly to the range [—1,1]). Both conformation angles (cj), ip) give averages that have 
cosines of both signs, both are biased towards positive values; however, the bias is 
more extreme in the case of ip. Since there is no such bias in our theory the most 
suitable conformation angle to correspond to the orientation degrees of freedom in our 
model appears to be 4>. In the same figure we can also see that there is no obvious 
correlation between polarity characteristics and steric characteristics. In our model 
this is assumed to be a property of the amino-acids, and we will find in our analysis 
that neither the primary structure generation nor the secondary structure generation 
introduces any such correlations. Finally, let us turn to the postulated preferred 
average polarity of any amino-acid chain (which was used in our phenomenological 
Hamiltonian) , purely on the basis of the energetic need to shield hydrophobic residues 
from the solvent and to expose hydrophilic ones. There is certainly evidence for the 
link between the average polarity of a sequence and the surface-exposure pattern of 
the associated protein structure [H]. If we plot all those proteins for which primary 
structure data are available as points in a plane, with the inverse size 1 /N as horizontal 
coordinate and the average polarity as vertical coordinate, we obtain figure [3] This 
figure supports strongly the existence of a an energetically preferred average polarity 
k*, with a value close to zero in rescaled polarity units £ E [—1, 1]. 
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3. Replica analysis of the model 

For integer n one can write the n-th power of the folding partition function 2f (A) in 
([6]) in terms of n replicas of the original system, to be labeled by a = 1 ... n. If the 
sum over the sequences A is carried out before the sum over conformations, one finds 
an effective theory in which the n replicas are coupled: 

fN =_ l ^ £ e -/m(0\...,0") (7) 
npN f-^ 1 

H(...)= _}_\ og J2e- p ^* Hti(f>C ' lX) - n0lu{X)+v{X)] (8) 
13 A 

For (3 — ► (infinite temperature) we have . .) — > — AHog A and the free energy 
retains only entropic terms, viz. lim / 3^o(/3/jv) = — log? — n log A. Upon using JT]), 
and inserting into the polarity term of the folding energy, we can work out 

the effective Hamiltonian ((SJ). If we introduce appropriate integrals over 5- functions 
(written in integral representation) to isolate the quantities A^ 1 53i £(^i)^0,0?> viz. 

J 2n 

we can carry out the sum over A in ||5J) and find, with the abbreviation z = {z a ^,}, 

N N & ^ 

A 

_L lnp. f_^dz_ BN[i £ z ait ,z^ + J p * 2 -nJ g v{± £ z «<P~ k *)] 

N J (27r//3AT)9n 

x j"| I y^ e -nj3u(A)-»/3£(A) £^ 2 Q *^^ ? +/3J S XL cos [Ct-i+0f-i-20f -a(A)] | 

Inserting this into ([7]) leads to an expression for the asymptotic free energy per 
monomer / = limjv^oo/jv that can be evaluated by steepest descent. Upon 
eliminating the conjugate integration variables {z^} by variation of {z^}, giving 
iz a <t> = Jg v ' (~^J2 a <t> z a<t> — ^*) — 2Jp z a<fr, an d upon defining the replicated single-site 
vectors <p i = (<f)\, . . . , 0") the result takes the form / = extr z </j„(z) with 

Ot(j) Ot(j> Ct<f> Ot(j) 

-IlogA-Jim^^log ]T n M ^'^'^iW ( n ) 

0\..0 n 1 



A=l 

X e^ Js Ea cos[0° +1 +0°_ 1 ^20°-a(A)]-n0«(A) ^) 

We recognize in ()llll2j) a replicated transfer matrix product embedded within a mean- 
field calculation, and conclude that this model is therefore in principle solvable. The 
only amino-acid characteristics that affect the folding process are its polarity £ (A) and 
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steric angle a(A), so we will from now on choose the single site functionality potential 
to have the form u(X) = /x£(A) + ^cos[a(A)] (where fi and v are control parameters). 

3.1. The case q = 2 

Our calculations become significantly simpler and more transparent for q = 2. Here, 
after a uniform basis rotation, the allowed residue angles are 4>i G {— 7r/2, 7r/2}, which 
can be written in terms of Ising spin variables o~i G { — 1,1} as <pi — o~in/2. We 
transform the 2n remaining replicated order parameters, which can be written as z a ±, 
into new order parameters m a = z a+ — z a ^ and k a — z a+ + Our equations 

will now involve the replicated spin variables <Tj = (o\ , . . . , erf), and the cosine term 
in the exponent of the transfer matrix simplifies to <rf +1 cr"_ 1 cos[a(A)]. With the 
short-hands m = (mi,...,m„) and k = (fei, . . . , k n ), (g(£,))(, = fd£ and 
(g(i])) v — fdr) w(r])g(r]), our previous expressions (|ll|12p take the form 

ny> n (m,k) = ij p (k 2 +m 2 ) + nJ g ^(-VX-fc*) - (- Y"k a )v'(- Vfc Q -k*) 

a a a 

- ilogA- Jrm^-^rlog E J\M[ai- X ,cri, 0-1+1 l m ' k ] ( 13 ) 



CT\ ... (T\ i 

M^r, CTi, <r i+1 \m, k] = (e K [ J » Y.J k -+ m ^)-^-nJ g v- (i £ a *«-fc*)]] ^ 



The disconnection inside M[. . . \ . . .] of the factor involving er^ from that involving 
<T i+1 • (Ji-i allows us to rewrite tp n (m, ip) into a more convenient form, with a new 
replicated transfer matrix r(m, k) that involves only the two sites i — 1 and £ + 1: 

J^[M[cr i _i,cr i ,(T l+1 |m, k] = ]J l (Ti+1 (m, k) (15) 

i i 

where 

rcr(T'(m, k) = ^ e ' 3 'j['-'s (T - (T '-'H^ ^ e /3C[./ P (J] c ,' £ °+ m - cr )- n A'-™- r 9' u '(^ XL fe °- fc *)l^ (i6) 

Since N is even and we have periodic boundaries, even sites thereby disconnect from 
odd sites. The trace in ip n is now in leading order for large N expressed in the usual 
manner in terms of the largest eigenvalue A(m, k) of the matrix r(m, k): 

<T l ...CTjv i 

= lim 1 logTr^/^m, k)] - log A(m, k) (17) 

N—>oo 1\ 

In fact, the specific dependence of </?„(m,k) on k via (fH)) is such that all its saddle- 
points will have k = fc(l, . . . , 1). This reduces the number of order parameters from 
2n to n + 1. We now have / = extr mj fc<p„(m, fc),with 

1 m 2 i 

¥> n (m,k)= o J p( — + fc2 ) + J g [v(k-k*)-kv'(k-k*)] -— logA 

2 ?i L J np 

-^-logA(m,fc) (18) 
rcrcr'( m '^) = ^ e ,3, '[ Js ' T '' T ' _ ™ iy ]) ^ e ™/ 3 4[' / p( fc +"~ lm '' T )-M-^si>'(fc-fc*)]^ (19) 
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Our problem has been reduced to the diagonalization of the 2" x 2" replicated transfer 
matrix (fT§|) . This matrix can be simplified to a form analyzed in [36, 37, 38, 39 upon 
making the so-called replica symmetric (RS) ansatz, which is equivalent to assuming 
ergodicity. Since the order parameters in the present model have at most one replica 
index, one expects RS to be exact at all temperatures. Now one has m a = m for all 
a, which simplifies our solution to / = extr m ^k<fiRs(jn, k) in which 



ip n (m, k) = -Jp{in 2 + k 2 ) + J g [v(k— k*)~ kv' (k — k*)] 
log A + log X RS (m,k) 



(20) 



[3n 

where Ai?,s( m j k) is the largest eigenvalue of 

r£ s CT/ (m,fc) = (e^ (T - (T '- n ^) ri (e nK[J ^ k+ ^ E,,*-)-/'-^*-**)]) (2 1) 
Working out the saddle-point equations for {m, k} from (|20|) leads us to 

m= ^^ l0gARs(m ' fc) (22) 

" = Pn[J p -J^k-k*)^ k l0SX ^ k) (23) 

An alternative (but equivalent) form for our order parameter equations that does not 
require differentiation of Ars(t7j, £;) is obtained if we extremize (p n (m,k) at the stage 
where it is still expressed in terms of a trace of powers of the matrix Trs^j k), viz. 



f3nJ p w->oo dm 8 L RS 



p 

= _J_ lim Tr[^r RS (m,fc).r^ 2 Kfc)] 
PnJpN^oo Tr[r^ 2 (m,fc)] 

= 2 lim^oo^log Tr[r^ s /2 (m,fc)] 

(3n J p -J g v"{k-k*) 
= J_ lim Tr[^r RS Kfc).r^ s /2 (m,fc)] 

Pn w-oo [j p -j gV »(k-k*)]Tr[r^ 2 (m,k)} 

Upon working out the partial derivatives of r R s(TO, k), and upon writing the left- 
and right eigenvectors of rRs(m, k) corresponding to the largest eigenvalue as {w^.} 
and the limit N — > oo can be taken. To avoid unwieldy equations we drop 

the explicit mentioning of the arguments (m,k) for quantities such as ArSj or 
from now on; the formulae should make this dependence clear. Using the replica 
permutation invariance of RS equations, the result can be written as 

_ _ Eerer ^iW'W^ 



Ars Eer u a u a 
Eerer u^Yctct'U^. 



(26) 
(27) 



where 



Ars E 

Y(j(j' = {e t3ri ^ JsCT ' a ''^ n ^) ^ e "^W ,:+ «E„' , <')^' , »"'^' i; *^ (28) 
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Finally, the physical meaning of the order parameters to and k, expressed in terms 
of the original variables {cTj, A.;} and averages over the equilibrated coupled relaxation 
processes, is found to be (see | Appendix A[ ): 

171 = i im 4E^( A ^» (29) 

N-^-oo iv * — 4 
i 

i 

(with double brackets ((•••)) denoting equilibrium averages over both the fast 
secondary structure formation process and the slow sequence selection process). 
Within the present model we may interpret to = 0, where the equilibrium amino- 
acid residue orientations are uncorrelated with amino-acid species, as describing a 
'swollen' state where secondary structure fails to develop (although, as we will find, 
for rn = there could still be phase transitions in terms of the amino-acid statistics, 
as measured by the order parameter k). States with to ^ would exhibit secondary, 
and by construction (via the polarity term in the folding Hamiltonian) also tertiary 
structure, so should be described as 'collapsed' states. 



3.2. Solution of the replicated eigenvalue problem 

It was argued in [36] that the left and right eigenvectors {w^-} and corresponding 
to the largest eigenvalue of matrices of the class (|2"TT) are of the following form: 

v% = jdx${x)e^ a °» (31) 

i& = JdyV(y)e Py ^^ (32) 

Inserting (|31l32p into the right/left eigenvalue equations J2cr> ^crtr' u cr' = ^RS u <t an d 
2<r' r <rcr' u tr = ^RSwJx, followed by use of the identity g(±l) = exp[/3(_B ± A)} with 

A= W kskWM- 1 )]' B = p to&W)9{-V\ (33) 

leads us to a re-formulation of our eigenvalue problems in terms of integral operators, 
where the role of n has changed from controlling the dimension of the problem (limited 
to integer values) to that of a simple parameter that can be continued to the real line: 

Ars$(x) = Jdx'A^(x,x')^(x') (34) 

Ars^) = Jdx'A 9 (x,x')V(x') (35) 
With help of the short-hands 

A(x, y) = i tanlT 1 [tanh(/?x) tanh(/3y)] (36) 

P 

B(x,y) = i-log[4cosh[/3( 2 : + y)]cosh[/3( 2 :-y)]] (37) 
one can write the kernels in (|34I35[) (of which again we seek the largest eigenvalue) as 

A^(x,x') = ^ < 5[x-eJpTO-A(x',77J s )]e"^ [B(x ^ Js)+ « (J ^-^ J ^' (fe - fe * )) -'- / ' )1 ^ c 
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Avfaaf) = (^5[x-A{x'+^J p m,r 1 J s )]e n0[B{x ' + ^ m ^ 

(39) 

Both kernels A$(x, x') and A^(x,x') take only non-negative values, so the eigenvalue 
problems (134135)) support solutions where $(x) > and 9(x) > for all x E 1R. 
We may then normalize these functions according to Jdx <&(x) = Jdx ^>(x) = 1 
and interpret both, in view of (|31l32p , as field distributions. A consequence of this 
normalization convention is that we obtain two relatively simple (and equivalent) 
expressions for the eigenvalue Ars upon integration of (|34I35|) over x: 

\ RS =Jdx vl/(. x )^ e ™^[- B ( ;l: +5 J p m ' ? ' J =)+^(• 7 p fc ^^ I ^ J 9 ^ ''( fc - fc *))- ,y, ^]^) (41) 

Given the normalized solutions \&(x) and 9{x) of ()34|35|1 with the largest eigenvalue, 
which will generally have to be obtained by numerical iteration, we can work out the 
remaining contributions to our order parameter equations (|26I27[) . such as 

u^u% = 2" f dxdx'$(x')$(x) cosh"[/3(a; + x')\ (42) 

u a Y <T<T<u^, = 2" j dxdx'$(x')*(x)^ee n/5[iJ(:,: ' r ' Js)+4(Jpfe_AI ~ J9t '' (fe_fe * )K, ' l/1 
era' •* 

x cosh n [(3{x + £J p m + A(x',r}J s ))})) (43) 
Y v^-axYaa'U^, = 2 n J dxdx'^x')^(x)(^tanh[P(x+^J p m+A(x',r]J s ))] 

CTCT' 

x e n0[B(x',7iJ s )+i(Jpk-iJ,-Jgv'(k-k*))-riv] 

x cosh n [(3(x + £J p m + A(x',r}J s ))})) (44) 



3. 3. Simplified form of the theory 



Equations (|26I27I34I35P (where we need the eigenfunctions with the largest eigenvalue) 
together with the supporting expressions (|38I39I40I41I42I43I44|1 constitute a closed set 
of equations for the RS order parameters {m, k, $(x), ^(x)} of our model. We now 
simplify this set further. First we define the following polarity probability density: 

w l^\ e nK{Jpk-fi-J g v'{k-k*)) 
P(0 = J d £, w ^ e n/3£'(J p k-ij.-J g v'(k-k*)) ( 45 ) 

It represents the amino-acid statistics that would have been observed in the absence 
of the fast process (see | Appendix A[ ). If we normalize the eigenfunctions {$(x), ^(x)} 
according to Jdx $>(x) = J dx ^>(x) = 1 we find that they are to be solved from 

Jdx'$(x') Jdrj w{n)8[x-^J p m-A{x^J s )]e n ^ B(x '^ J ^- u ^ \ 



Jdx'$(x') Jd-q w (j 1 )e n ^ B{ - x '' 1 i J ^- ur i\ J 

(46) 

fdx'fdtp(Z)V(x'-tJ p m) Jdrj w{rj)5[x - Aix'^J^e^^''^-^ 
Jdx' fd£ p(£)V(x'- £,J p m) Jdrj w (r])e n ^ B ( x '' r ' J '>)-^ 

(47) 
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The variables x in <S>(x) and ^{x) have the dimension (in spin language) of fields, so 
$(a;) and ^{x) must represent field distributions. In fact they are connected in a very 
explicit way: they can be expressed in terms of each other via 

Jdx'^(x') Jdrj w(rj)5 [x - A(x', rjJ s )] e »/5[sO'^)-H 
= fdx'<S>(x>) Jdn u ,( r? )e«/3[B(x', I ,J s )- I ,r ) ] ( 48 ) 

$(z) = Jd£ p(0*(z - J P m£) (49) 

One proves these statements by substituting |48p into (j4"T|) and (f4"9")> into which 
shows in either case that both sides of the respective equation are identical. The 
remaining eigenvalue problem (I4T[) is still nontrivial, but some properties of its 
solution(s) can be established easily. First, it follows from | tanh(/3A(a; / , r)J 8 ))\ = 
| tanh(/?a; / ) tanh(/3?/J s )| < tanh(/3|f/| J s ) that any solution ^(x) must have ^>(x) = 
for |a;| > J s max^^^^Q \r]\. Second, as soon as J s > and J p m ^ there cannot be 
solutions of the trivial form ^(x) = 8(x — x*) for finite n. This is clear upon inserting 
\&(a/) = S(x' — x*) into the right-hand side of (|47|) : for J s > and J p m ^ there will 
always be multiple values of A(x* ,r/J s ) (since r\ and £ take multiple nonzero values), 
so it is impossible for the right-hand side of (|47| to produce a (^-function. 

We can now eliminate the distribution §(x) and its eigenvalue problem from our 
theory, and reduce our order parameter equations to a set involving {to, k, ^(x)} only. 
The function ^ is still to be solved from the eigenvalue equation (|4T|) . whereas our 
two scalar order parameter equations can now be made to take the transparent form 

to. = J d£dh W{h,C) £tanh[/3/i] (50) 
r d?dh W(h,0 £ (51) 



with the joint equilibrium distribution W(h, £) of local effective fields and polarities: 

P (Q cosh n [f3h}fdx *(xMh-x-J p m£) 
1 ,5J JdC'^'p(C')cosh n [/5^]/dx ^(ajJ^'-a-Jpm^') 1 j 
Upon calculating the equilibrium distribution 

tt(£, t?) = ^ - £(A0Mr» - coB[a(Ai)]] » (53) 

(see |Appendix A[ ) one finds that 7r(£, 77) = 7r(£)7r(?7), and that 7r(£) — Jdh W(h, £). 
The equilibrium distributions 7r(£) and 77(77) will generally differ from the prior 
distributions w(£) and 111(77) that would be found upon simply drawing ammo-acids at 
each site randomly and independently. However, the factorization 7r(£, rj) = ir(t;)Tr(r)) 
tells us that, although it impacts on amino-acid statistics, in the present model the 
sequence selection process does not induce correlations between polarity and steric 
angles. 

Given a solution of equations (|47I50I51[) we can evaluate whether it is the physical 
one (i.e. the one with the lowest free energy) by calculating (f^Uj) . which now takes the 
simple form 

tp = \j p {m 2 + k 2 ) + J g [v(k-k*)-kv'{k-k*)] - l0gA 



2 pv ' ' ' SL v ' v n (3n 
1 

fin 



1 log / d£ w (Q e nmJpk- fl -J 9 v'(k-k*)) (54) 
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- ilog J dxd£_ p(£)V(x-J p mt) J dn w {ri)^ B ^ J ^- v ^ 

4. Solution of order parameter equations for special cases 

4-1- The state without secondary structure 

Our equations always allow for solutions with m = 0, describing states where no 
secondary structure develops. To see this we first note that now ty(x) = <&(x) for any 
x, that (|36I37[) obey A(— x,y) = —A(x,y) and B(—x,y) — B(x,y), and that 

Ay(x,x'\0,k) = (<5[x-A(x',r;J s )]e"' 9[s(a; ''' Ja) - ,y ' ?I ) I) (e n ' 3 « ( ^ fe -' J - J <' , '' (fe - fe * )) ) e (55) 

Due to the above symmetries of A(x, y) and B(x, y), one has A^(x, x') — A$(— x, —a/), 
so A* commutes with the parity operator. Its eigenfunctions are therefore either 
symmetric or anti-symmetric. The anti-symmetric eigenfunctions are ruled out by the 
requirement \&(x) > 0, so we conclude that \I/(x) must be symmetric in x, and that 
therefore W(— h, £) = W{h,£). From this it follows, via the saddle-point equation for 
m, that m = indeed solves our equations for any choice of the control parameters. 
The distribution ^(x) is for m = to be solved from 

Jdx'^(x') Jdn w{n)S[x- A(x',nJ s )]e n ^ B ^ J ^^ 
= fdx'*(x') Jck) w ( v )e n K B (*'>nJs)-vv] (56) 

This equation has the trivial solution 'fy(x) — 5(x), which is in fact unique. To prove 
uniqueness we use | tanh[/L4(x', r]J s )]\ = | tanh(/3a;')| tanh(/3J s ). Since ^(x) = for 
|x| > J s we can define the largest interval [— u,u] C [— J s , J s ] such that \&(x) = for 
x ^ [—it, u]. Inside the numerator of (|56p we now know that any nonzero contribution 
to the integral must have \x'\ < u, so | tanh[/L4(x', 7?J S )]| < | tanh(/3u)| tanh(/?J s ). 
Hence equation (|56|) tells us that if \&(x) 7^ then | tanh(/3x)| < tanh(/3J s )| tanh(/3u)|, 
but now one must also have |tanh(/3u)| < tanh(/3J s )| tanh(/3u)|. Clearly the only u 
that satisfies the latter inequality is u = 0, which completes the proof that $(x) = S(x). 

Furthermore, upon inserting ^(x) = 5(x) equation (j52"|) tells us that W(h,£) = 
p(£)8(h), with p(£) given by (|4"5)) . This makes sense, since the contribution to the 
fields that depends on the polarity does so via the mean-field forces, which are absent 
for m — 0, whereas in the absence of long-range folding forces the remaining one- 
dimensional chain cannot order (hence all effective fields are zero). The saddle-point 
equation for k can be simplified to 

fd£ £ ■^£) e "' 3 £[ J p fc ~^~ J 9 t ''(' c ~' c *)l 

k = Jd£_ w (£) e n(3Z[J P k-»-J g v'(k-k* )] ( 5? ) 

This equation shows that even for m = (i.e. no secondary structure) there is still 
an effect of the coupling between sequence selection and residue orientation: there 
will still be an effective preference for homogeneous sequences, due to the increased 
potential for energy gain (via J p ) if monomers are of the same type, which is however 
counter-acted by the energy cost of polarity homogeneity as controlled by J g . 

Finally, using the above results as well as B(0,nJ s ) = (i~ x log [2 cosh(/3fJ s )], the 
free energy of the m = state is seen to take the value 

n0£[J p k— fj,— J g v'(k— k*)] 



<p = -J p k 2 + J Q \v(k-k*)-kv'(k-k*)] -^log / ' d£ w(£)e 
2 1 J pn J 



l0g2 _ logA _ 1 - v e nPQgco«M^J.)-/J^] (58) 

p fin (in 1 
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4-2. Analytical solution in verifiable limits 

4-2.1. Infinite temperature. In the infinite temperature limit (3 — > one has 
A(x,y) = (3xy+0((3 3 ) and B(x,y) = (3" 1 log 2+±(3(x 2 +y 2 )+0((3 3 ). From this we can 
immediately extract the following solution of our saddle-point equations (|47I50I51[) : 

lim *(x) = S(x), lim W(£,h) = w{£)6{h) (59) 



lim m = lim k — df £ w(£) (60) 
p^o p^o J 

The corresponding value for the free energy (154|) is 

lim (3f = -n" 1 logA-log2 (61) 

This is just the (3 — > limit of the m = (swollen) state. We recognize the free 
energy reducing to the entropic contributions from the angular (— log 2) and from 
the sequence (— (fin)" 1 log A) degrees of freedom, and the average polarity k reduces 
to that of the amino-acid pool. All this is easily understood on physical grounds. 

4-2.2. Random sequences: n — ► 0. According to n = (3/(3 this limit describes the case 
where monomer sequences are selected fully randomly, independent of the functionality 
potential or the secondary structure they would generate. Our equations must for 
n — ► therefore reproduce the theory developed for random hetero-polymers in |26j . 
provided we set the hydrogen bond coupling in [26j to zero. Here we find for n — > 
that our equations indeed simplify considerably. As expected we obtain = w(£), 
since sequences are selected randomly from the amino-acid pool, and hence k = 
We are then left with the following eigenvalue problem for ty(x): 

*(s)= [dy^(y)((S[x-A(y+J p m^r,J s )])k, v (62) 



with $(x) = (ty(x — J p m£))^. But now the order parameter m (which measures the 
degree of orientation specificity along the chain of hydrophobic versus hydrophilic 
residues) is to be solved from 



m =(£ J dh W(h\£) taxihlfih])/: (63) 

W(h\0 = Jdx y{x)^{h-x-J p m£) (64) 
Equivalently, upon using (jl?2"j) : 

m= Jdxdx l ^(x r )^(x)((£tanh[f](x+£J p m + A(x',r]J s ))})) i:V (65) 
The corresponding free energy per monomer is 

lim (/+^) = \jv{™?~ (Of) + + "{n)v + ^««0e-**) 

- Jdx $(x){B{x,r)J s )) n (66) 

The theory for random sequences in [26j (based on random field techniques rather 
than the replica formalism) involved as its main order parameter the distribution 
PcofelftJpm) of three ratios k = (kx,k2,ks) of conditioned partition functions 



A solvable model of the genesis of amino-acid sequences via coupled dynamics 17 

(condition on the values of the last two spins of the chain). The link between our 
present equations and those in [26j is made via the identification 

P 0o (k\(3J p m)=5{k 3 -fa) Jdxdy ^>(x)^(y)S(k 1 -e 2Py )S(k 2 -e 2(3(x - y) ) (67) 

Using = (^(x — J p m^))^, equation and the relation A(x,y) = 

(2/3) -1 log[cosh(/3a; + (5y)j cosh(/3ir — (3y)\ one proves that Ip7|) obeys 



Poo (k|/3 J p m) = J dk'P^ik'lfiJpm) 



^(k'lP J S T)) 
F 2 (k'\f3J 8 r,,[3J p mt;) 
F 3 (k'\j3J s r)) 



with 



km*) - e zt h + :z WW) = e :: k i\ +e : x m 

e x k\K2+e x e x k\k2 + e x 

K{k\x) = ^t^^ (69) 
e x k<zk§ + e x 

which is indeed the limit Jub —>■ of the equation derived for q = 2 in [26) . 

4-2.3. Mean field limit. A second limit which can be verified using earlier work is that 
where J s — > and J g — > 0, describing the coupled dynamics of sequence selection and 
secondary structure generation in heteropolymers with (one type of) polarity energies 
only, the simpler case studied in [35 . In this limit the model contains only mean-field 
forces, and no longer involves transfer matrices. Using the identities A(x, 0) = and 
B(x,0) = P~ 1 log[2 cosh((3x)] one extracts from (|4"T| that ^>(x) = S(x), and so 

m = Jd£dhW{h,C> £tanh[/?/i] (70) 

k = J d^dh W(h,0 £ (71) 

w(h ~ = co S h n [(3h}p(Q5[h~^J P m} 

[ ' U fdh>cosh n l[3h'}fd?p(?)S[h'-eJ p m] [ ' 

As could have been expected, the equations for the scalar order parameters (m, k) 
already close onto themselves. In explicit form they are 

_ (£tanh(/3£J p m)e™' 3 «( J » fc -^ cosh n (/3£ J p m)) 6 



(73) 
(74) 



( e n/3f(^fc-M) C osh"(^Jprn))j 
_ (^ e "^(Jpfc-M) cosh n (/3g J p m))z 
~ (e"«Mp*-/0 cosh"(/3£ J p m)) 4 
The amino-acid statistics in equilibrium are given by 

w (0e n/3C(Jp ^ A ' ) _ w(??)e-™ /3,y '' 

Finally, using $(#) = fd£ p(£)5[x — ^J p m] we may work out the value of the free 
energy per monomer for J s = 0: 

- -J- log / d£ cosh n (/?£J p m)e Tl/3 « (J >> fc ^ ) (76) 
/9n 7 
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The specific model studied in [35] had £ S { — 1, 1} and w(£) = 5(^,1 + <^,-i)j i.e. no 
varying degrees of hydrophobicity or hydrophilicity and a polarity-unbiased amino- 
acid pool. Steric effects did not come into play in [35] (monomers were characterized 
only by their polarity), so we may here simply take v = 0. These choices simplify our 
two remaining order parameter equations (|73l74p to 

m = tanh(/3J p m), k = tanh[n/3(J p fe — /x)] (77) 

These equations are indeed identical to those of [35], given q — 2. Similarly, for 
w(£) = 5(^,1 + <^,-i) an( i v — the free energy per monomer (|76p now simplifies to 

Hm / = ij p m 2 - l0g ^ /2) - ± log[2 cosh(/3 J p m)] 

+ i J P fc2 - -^-log[2cosh(n/3(J p fc-^))] (78) 
2 pn 

Apart from the excess entropy — log(A/2)//3n due to the extra chemical degrees of 
freedom of our present monomers compared to the ones in [35] (and modulo a trivial 
typo in [35]) this is indeed the free energy expression found in [35J for q = 2. 



5. Transitions and phase diagrams for deterministic sequence selection 

We now turn to nontrivial regimes where analytical solution is still possible, but where 
our model does not map onto any existing model in literature. The biologically most 
relevant regime is that of low or even absent genetic noise levels, viz. n — > 00. We 
still have to select a form for the polarity balance potential. Since v(k — k*) must 
be minimal at k = k* and increase monotonically with \k — k*\, we choose a simple 
quadratic form v(u) — \u 2 . Thus from now on we will have v'(u) — u and v"(u) = 1. 

Since n = P/(3, the limit n — > 00 corresponds to the case where monomer 
sequences are selected fully deterministically, such as to minimize the effective 
Hamiltonian ([5j. Here, in view of many exponents in our equations growing with 
n, we may evaluate virtually all integrations by steepest descent. With a modest 
amount of foresight we define the canonical polarity balance ko as 

Clearly limj «x> — k* , and limj g _,o ko = /j/ J p . To keep our analysis as transparent 
as possible we will not consider pathological parameter coincidences but restrict our 
discussion to the generic scenario where J g ^ J s , v ^ 0, and ko € (—1, 1); the system 
behaviour in the pathological cases can always be understood as specific limits and/or 
degeneracies of the more generic solutions. Here the order parameters {m,k 7 ^/(x)} 
are to be found by analyzing the solutions for n — > 00 of the following equations, where 
the complications are mainly in the subtle dependence of the distribution ^>(x) on n: 

Jdx' fd£ p(£)*0') f d V w(r])6[x- A(x' ' + J p m£,r]J s )]e n ^ B ( x '+ J p m ^ r i J ^-^ 
fdx' fd£ fdr] w^e^^^'+Jp^^-^ 

(80) 

J d£ J dxdy y{x)^{y) tanh[/3( J p m£+x+y)} cosh"[/3( J p m£+x+y)} 

Jd^p{OJdxdy^(x)^(y)cosh n [P(J p m^ + x + y)} { ' 

= /dg g(Qg Jdxdy VjxMy) cosh"[/3( J p mj + x + y)] 

Jd^p(OJdxdy^(x)^(y)cosb n \j3(J p m^ + x + y)] ( ' 



A solvable model of the genesis of amino-acid sequences via coupled dynamics 



19 



with the abbreviations 

w U\ e n/3C(.Jp-Jg)(k-k ) 

p ^> = J$Ti w ^ynK'(j p -.j g )(k-k a ) ( 83 ) 

fco =(k*-n/J g )/(l-J p /J g ) (84) 

Once the above equations have been solved for n — > oo, the associated values of the 
free energy per monomer subsequently follows upon taking the n — > oo limit in (|54|) . 

5.1. The two simple cases J s — and J p m = 

In both these special cases our problem simplifies significantly due to 4" (x) = 6(x) (a 
property which has been established earlier). If we define 

L(0 = ^ log cosh(/3 J p md) +Z(J p -J g )(k-k ) (85) 

we see that our remaining equations for to and fc reduce to a simple form in which for 
n — ► oo the integration over £ is dominated by the maximum of £(£), subject to the 
constraint £ G [— 1, 1] imposed by the measure w(£): 

Jd(wm^nh((3J p mQe n ^ 

fd£ w(Ote nl3L ^ , . 

fc = Urn J r ^ v ;;; 87 

If J p m ^ then £(£) is maximal either for £ = sgn[(J p — Jg){k — fco)] (if linin^oo fc 7^ 
fco), or for £ = ±1 (if limn^oo fc = fco). In either case one has £ € { — 1,1}, so we 
always find for n — 00 the simple Curie- Weiss law to = tanh(/?J p m) which describes 
a transition to secondary structure at T = J p . Our equation for fc, on the other hand, 
will produce for n — » 00 only solutions of 

fc = sgn[(J p - J 9 )(fc-fc )] (88) 

(this includes the case fc = fco). Graphical inspection of this equation shows 
immediately that for J p < J g the only solution is fc = fco, whereas for J p > J g 
we have the additional solutions fc = ±1. If J p m = then L(£) is either maximal for 
£ = sgn[(J p — J g )(k — fc )] (if lim n _>oo fc 7^ fco), or it is a constant on £ e [—1, 1] (if 
lim ri _, 00 fc = fc ). Here one has £ € {— 1, 1} only if fc 7^ fc . 

Working out the free energy per monomer (f54|) gives, using B(x,y) — B(\x\, \y\) 
and the property that S(|x|, \ y\) increases monotonically with both \x\ and \y\, 

ip = \-J v m % + \j g k* 2 + ^{J P - J g )k 2 

_ li m J_log fdn W (n) [ d£ w {^)e n ^^- J ^ k - k ^ +B ^^ m ^ J ^ v ^ 

n^oo fin J J 

= \jp™ 2 + \h k * 2 + \{J P - Jg)*? 

- max U(J P - J g )(k-k ) + B(J p m£,r]J s )-vr)\ 
£,7)e[-i,i] i- > 

= ij p rn 2 - B(J p \m\, J,) + \jgk* 2 ~ \v\ + \{J P -Jg)k 2 - \J P -Jg\\k-k \ (89) 

where the maximum corresponds to r\ = — sgn(i/) and £ = sgn[( J p — J g )(k — fco)]. The 
last line reveals that in cases where we have multiple solutions, viz. J p > J g , the 
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solution k = k a is always a local maximum of ip and k = ±1 are always local minima. 
Of the latter two, the lowest free energy is found for k = — sgn(fc ) (this is therefore 
the state that it not only locally stable but also thermodynamically stable). Therefore 

J p > J g : k = ±1, J p < J g : k = k (90) 

This implies a discontinuous phase transition at J p — J gi where we go from k = ±1 
(homogeneous polarity sequences) to k = k e (—1,1), where the sequence becomes 
inhomogeneous in polarity. 

If we calculate the distribution W(£, h) for the above solutions we always find 
W(£, h) = 7t(£)<J(/i), but with potentially different polarity statistics. For the k = ±1 
states one has 7r(£) = <5(£ — k). For the k = k solution, however, we need to look 
beyond the leading order and write k — k + n~ 1 ki +0(n~ 2 ). Here we find for n — > oo: 

with k\ to be solved from 

This concludes our solution for the simple cases J s = and J p m = 0. From now on 
we consider the case where J s > and J p m ^ 0. 



5. 2. Summary of the n — > oo theory 

The full analysis of our order parameter equations in the limit n — + oo via saddle- 
point analysis, for arbitrary (J s ,J p ), turns out to be nontrivial; details of this 
calculation would interrupt the flow of the paper and have therefore been delegated 
to |Appendix B[ The end result, however, is surprisingly simple. We can summarize 
the final equations for our order parameters (k, m) describing the systems states as 
identified in the limit n — > oo in the following compact way: 

J g >J p : k = k , m = or Fpj p (m) = — tanh(/3J g ) (93) 

J p >J g : k = ±1, m = or Fp j p (m) = sgn(i/) tanh(/3 J s ) (94) 

in which the function F x (m) is defined as 

, s tanhfizm — i tanh _1 (m)l , 

F x {m) = ^ f _ v /J (95) 

tanhfgaJTO + o tanh (m)] 

Only the solutions with k = ko as obtained for J g > J p correspond hetero-polymers 
with inhomogenous polarity along the chain, i.e. to systems of the protein type. 
The solutions with k = ±1 (with k = — sgn(fco) being also thermodynamically stable) 
describe a situation where the sequence selection results in polymers with homogeneous 
polarity. For J p > J g we have two further conditions (|B.26|B.27|k these are always 
satisfied for m = 0, but may be violated by saddle-points for which \m\ is too large. 
We observe that for v < the homogeneous polarity states and the inhomogeneous 
polarity states exhibit fully identical levels of secondary structure (as measured by 
rn), for any combination of 0J P and (3J S . Here is it therefore also easy to show by 
comparing the two free energy expressions (|B~29f and (|B~45|) that for J p > J g the 
free energy per monomer of the k = ±1 state is lower than that of the state k = ko, 
whereas for J g > J p the free energy of the k — kg state is lower. For v > 0, however, 
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Figure 4. The function F x (m) for x e {§,1, §,2, ~,3, |,4, |,5} (from bottom 
to top). Solving the equation F x (m) = y for m can give at most one positive 
solution if x < \/3, where [d 2 F x (m)/dm 2 ]| m =o < 0- It may have two positive 
solutions if x > \/3, where [d 2 F x (m) / 'dm 2 ]\ m= o > 0, provided one also has 
y > F x (0) = (x — l)/(x + 1). For sufficiently large y the equation F x (m) = y will 
no longer have any solutions. 

the two states no longer have identical values of m, with that of the k — ±1 state 
being lower; here the system finds it increasingly difficult to combine homogeneous 
polarity sequences with secondary structure. 

Let us inspect the bifurcation phenomenology for the order parameter m. Note 
that Fo(m) = —1 for all m € [—1,1], and that ^^(m) = 1 for all m € [—1,1]. For 
x > the function F x (m) is symmetric in m, with F x (±l) = —1 and with 

F x (m) = ^ - - 2 #-^| + 0(m 4 ) (96) 
x + 1 3(a; + 1)^ 

(see also figure 0J. In view of the symmetry F x (— m) = F x (m), we conclude that 
(depending on the values of (x,y)), the equation F x (m) — y has either zero, two 
(±m*), or four (±m*,±mo) nontrivial solutions in m. 

In the (x, y) plane, where x = /3J p and y — tanh(cr/3J s ) with cr = ±1 (so 
a = sgn(f) for J p > J 9 and a = —1 for J g > J p ), the bifurcation scenarios for 
our saddle-point equation F x (m) = y can now be summarized as: 

x < V3 : continuous transition at y c = (x— l)/(x + l) 

y <y c : m € {0, ±m*(x)} 

y > y c : m = 

x > V3 : continuous transition at ?/ c = (x— l)/(x + l) 
y <y c : m e {0, ±m*(x)} 
y>y c : m € {0, ±m (x), ±m*(x)} 

discontinuous transition at y c > (x— l)/(x + l) 
y < y' c : m{0, ±mo(x), ±m*(x)} 
y > y' c : m = 
The result is shown in figure [5] 
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x < -\/3 x > -\/3 




Figure 5. The bifurcation scenarios for the solutions m of the equation F x (m) = 
y, with x = (3J P and with y = sgn(V) tanh(/3J s ) £ [—1,1] for J p > J 3 and 
y = — tanh(/3J s ) £ [—1,0] for J 9 > J p . Solid lines correspond to stable solutions 
(local minima of the free energy), whereas dashed lines correspond to unstable 
ones. The trivial solution m = changes stability at /3J S = isgn(^) log(/3J p ) for 
J p > J g and at f3J a = i log(/3J p ) for J 9 > J p . 

5.3. Phases, transition lines, and phase diagrams 

We can characterize the phases of our system for n — > oo in terms of the values for the 
order parameters (k,m), where k provides information on the primary structure (the 
average polarity) and m provided information on the secondary structure (the extent 
of order in the side-chain orientations). The system is found to exhibit five phases: 

HS ('homogenous & swollen'): k = ±1, m = 
primary structure but no secondary structure, 
selected sequences are homogeneous in polarity 

HC ('homogenous & collapsed'): k = ±1, m/0 
both primary and secondary structure, 
selected sequences are homogeneous in polarity 

HM ('homogenous & mixed'): k — ±1, coexistence of m — and m^O 
primary structure, with secondary structure controlled by remanence, 
sequences are homogeneous in polarity 

IS ('inhomogenous & swollen'): k — ko, m — 
primary structure but no secondary structure, 
selected sequences are inhomogeneous in polarity 

IC ('inhomogenous & collapsed'): k = k , m ^ 
both primary and secondary structure, 
selected sequences are inhomogeneous in polarity 

There is no random (paramagnetic) phase m = k = 0. This is a consequence of the 
n — > oo limit: since the noise in the genetic selection (viz. mutations) is removed, 
there is at least always a primary structure developing as measured by k ^ 0. 

Similarly, we can summarize the transitions we have by now identified: 
• HS— TS and HC— TC: discontinuous transitions, at 

Jg = J P (97) 
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The HS— >IS line is found in the regime of small values of J p . The HC— >IC line is 
found for large values of J p . Along the latter line, if v < only k is changed at 
the transition, if v > both k and to are changed. 

HS— >HC, IS— >IC, and HC^HM: continuous transitions, at 
f |sgnMlog(/3J p ) if J p >J g 

\ -§logQ9J P ) if J g >J p V ' 

The HC^HM line exists only when J p > J g and v > (where the coexistence 
phase HM is found). 

HS^HM: discontinuous transition, to be solved from the coupled equations 
tanh[i/3J p m — | tanh _1 (m)] 



tanh(/3J p ) (99) 

-^H^p"" i 2 r """ : '" : 



tanh[i/3 J p m + i tanh 1 (m)] 
l-tanh 2 [i/?J p 7n-i tantrum)] /3J p (l-m 2 )-l 



tanh(/3J s ) (100) 



1 - tanh 2 [i/3J p m + i tanh" 1 (to)] /3J p (l-m 2 ) + l 

where the second equation is obtained from combining Fpj p (m) = tanh(/3J s ) with 
7^Fp.i P { m ) = 0- This line starts at the triple point ((3J p ,f3J s ) = (V3, -|log3) 
in the (f3J p ,/3J s ) plane, and rises continually for (3J p > \/3- It emerges only for 
J p > J g and v > (where the coexistence phase HM is found). 

At the continuous transition (|9"5)) the to ^ state always takes over the stability from 
the trivial one. This can be seen upon expanding the two free energy expressions 
(|B.29IB.45[) for small to. For both expressions this gives 

(3(<p-<p m =o) = iTO 2 (/3J p + l) 2 {tanh 2 (/3J s ) - (jf^f) + 0(m 3 ) 

Although both are co-located and are continuous in the fundamental order parameters 
(to, k) , there is an important difference between the HS— >HC and the IS— >IC 
transitions, which involves the behaviour of the polarity distribution 7r(£). As one 
crosses from HS into HC, 7r(£) remains unchanged, taking the value 7r(£) = <5(£ — k) in 
both states. In contrast, we know from (f9"Tj) that the IS state has a continuous polarity 
distribution 7r(£) = Jdh W^(£, h) whereas the IC state has the binary distribution 
tt(£) = i(i + fc )<J(£_i) + ±(l-fc )<5(£ + l). Thus, the transition IS^IC is in fact 
discontinuous, in spite of it involving no jump in the order parameter to itself. 

Upon translating our results into the original control parameters (3J P and [3J S 
one obtains the phase diagram cross-sections shown in figures [6] and [7] The phase 
where compact (to ^ 0) and swollen (m = 0) states coexist will be characterized by 
strong remanence effects. The thermodynamic transition line (calculated by selecting 
the solution with the lowest free energy) coincides with the second order transition 
for (3 J p < -\/3, and will be found inside the coexistence region for (3 J p > %/3. 

Without noise (i.e. random mutations) in the sequence selection process, (3 = oo, 
we can summarize the behaviour of the system as follows. For J p > J g it always finds 
itself in states where any infinitesimal functional advantage of either the hydrophilic 
or the hydrophobic monomers leads to amino-acid sequences that are, unlike proteins, 
fully homogeneous in their polarity. The phenomenology described by the remaining 
equations for to and the resulting phase diagram reflect the interplay between the 
tendencies of the polarity-homogeneous system to have similarly oriented amino-acid 
residues (induced by the long-range forces) and low steric energies (induced by the 
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Figure 6. Phase diagram cross-section for n — + oo (deterministic sequence 
selection) for the cases where either J g > J p (protein-like inhomogencous polarity 
sequences, k = fco) or where J g < J p (homogeneous polarity sequences, k = ±1) 
but with v < 0. Solid line: transition marking the continuous bifurcation of 
collapsed (m ^ 0) states, although for J g > J v this transition is discontinuous in 
the polarity statistics. Phases are defined and described in the main text. 



T/J P 




Figure 7. Phase diagram cross-section for n — > oo (deterministic sequence 
selection) for the case where J p > J g and v > (homogeneous polarity sequences, 
unlike proteins). Here the system is unable to minimize steric and polar energies 
simultaneously. Solid line: the continuous transitions between swollen (m = 0) 
and collapsed (m ^ 0) solutions. Dashed: the discontinuous transition. Phases 
arc defined and described in the main text. 



short-range forces). The system behaves as an Ising chain with random short-range 
bonds and uniform long range bonds. In those cases where the amino-acids are forced 
by steric effects to have non-identical side chain orientations (i.e. for v > 0) there is 
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a complex competition between long range and short-range order, which leads to low 
values of \m\ and strong remanence effects, in sharp contrast to the situation in mean 
field models [35]. In contrast, for v < both the long range and the short range forces 
promote similar side chain orientations; the absence of frustration is responsible for 
the absence of remanence effects and for having large |m| (strong secondary structure). 
For J g > J p it is no longer energetically advantageous to select chains with uniform 
polarity, and here we find the protein-like states. The polarity inhomogeneity of the 
sequence reduces dramatically the energetic impact of the long-range forces compared 
to the case k ± 1, and this decouples the strength \m\ of the secondary structure from 
any preference for aligning or anti-aligning short-range forces, as controlled by v. 



6. Transitions and phase diagrams for non-deterministic sequence 
selection 

In this section we extract solutions, transition lines and phase diagrams from our 
order parameter equations for non-deterministic selection of primary sequences, viz. 
finite n. Full analytical solution of our equations is generally ruled out, so we restrict 
ourselves to the study of instabilities and to collecting further information on phases 
by solving our equations numerically. As in the previous section we restrict ourselves 
to simple parameter choices, in particular we take v(u) = ^u 2 and ko 6 (—1, 1). 

6.1. Continuous transitions away from m = 

We first derive exact conditions marking continuous phase transitions away from the 
state m = without secondary structure as defined and studied earlier, for arbitrary n. 
For m = one has ^>(x) = Q(x) = S(x), and fc is to be solved from (|57| . We make in 
our order parameter equations (|45I47I50I51I52[) the substitutions Am, fc— >k+Ak, 
and if?(x)— >5(x) + A 1 $(x). We next expand these equations in {Am, Afc, A^(x)} and 
locate their linear instabilities. In doing so we may use k = fd£ p(0£j which holds 
for m = 0. In practice it turns out somewhat easier to involve also the auxiliary 
distribution &(x), and replace ([47|) by the pair (|48|49|) . First, substitution in and 
expansion of equations (|4"5"1) and P5| gives 

Ap(0 =n[3(J p -J g )(t~k)p(0Ak + O(A 2 ) (101) 

A$(x) = A*(a;) - J p kS'{x)Am + C(A 2 ) (102) 

These results are then substituted into (|48| . which leads to an equation for A^(x): 

A*(x) = (103) 

Jdx'[AV(x')-J p kAmS'(x')]Jdr) w(ri){5[x- A{x\nJ s )]-5{x)}e n ^ x '^ J ^ lJ ^ 2 

fd v w ( 7? ) e ™/3[s(o, J) J 3 )-^] +0(A ) 

We next separate A^(x) into its symmetric and anti-symmetric parts, A^>(x) = 
A^s(ie) + A^a(x), giving up to order A: 

_ /^Avl/ g (x')/d7 7W (7 ? ){^[.T~A(^,7 ? J,)]-^(x)}e"^ i? ^'' J -)-^] 

Jdx'[A^ A (x')-J p kAmS'(x')]Jdr] w(r])S[x-A[x', ^j s )] e ^[s(x',nJ»)-^] 
^ ~ Jdn w ( f] )e^[B(o^.j s )- vv ] 

(105) 
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The symmetric and anti-symmetric parts obey independent equations, and only the 
anti-symmetric part ^a(x) is coupled to the bifurcation of m ^ 0. Apparently, any 
nonzero solution of equation (|104[) describes transitions from one m = state to 
another, whereas equation (| 1 05|) controls the bifurcations away from m = 0. 

In order to expand equations (|50I5 1 1) for the scalar order parameters we need to 
vary the distribution W(£, h) defined in (|52|) . which we first rewrite as 

= p(Q cosh n (Ph) J dxdy V(x)V(y)6(h-J p mt-x-y) 
Jdep{^')Jdxdy^(x)^(y)cosh n [(3{J p mC+x+y)] 
Upon varying this equation around the m — state we then find 

AW(£, h) = Ap(£)6(h) + 2p{£) cosh n (/3h)A^(h) - J p Am £p(£) cosh n (/3h)5'(h) 

- 2p(Z)6(h) [dyco S h n ((3y)A^ s (y) + 0(A 2 ) 



= p(0{nP(J P -J g )(^-k)S(h)+cosh n (l3h)[2A^(h) - J p £Am S'(h)} 

- 25(h) J dy cosh"(/3?;)A*s(y)} + 0{A 2 ) (106) 
Insertion into (|50I51|) then gives, using Jdh tanh(/3/i) cosh n (/3h)6' (h) = —(3: 

Am = 2k Jdh tanh^/i) cosh n (/3/i)A# A W + f3J p Am J d£ p(0£ 2 +O(A 2 ) (107) 

Afc =np(J p -J g ) Jdt, i 2 p(i)-k 2 Ak + 0(A 2 ) (108) 

As expected, the perturbations Am couple only to the anti-symmetric part of A^(x); 
the m/0 bifurcations are the instabilities of the coupled pair (|105I10T|) . Furthermore, 
equation (| 108[) for Afc does not depend on the symmetric part of A\&(x), so we may 
for the purpose of studying continuous transitions away from the m — state regard 
5fy(x) as strictly anti-symmetric and extract instabilities involving k only from (|108j) . 

It turns out that the (anti-symmetric) functional perturbation A^>a(x) that solves 
equation (|105|) can be expressed in terms of Am. We show this by substituting for 
A 7^ 1 the ansatz 

\ T k 

A^ A (x) = ^4 s '( x ) Am ( 109 ) 
A — 1 

into the leading orders of p05p . Using integration by parts and the properties 
d x B(x, y)\ x =o = an d d x A(x, y)\ x =o = tanh(/3y) this is found to give 

/ dx'S'{x') J drj w(r])S[x - A(x', r} J 8 )]e n ^ B{ - x '' 7 > J ^- v ^ 



\8'{x) = - 



fdr) W ( 7? ) e ™/3[S(0,»;J S )-^')] 

Jdx'S(x')Jdn wiri^npSix-Aix^ijJs^^Bix'^Js^e^ 13 ^^-^] 

fdr] w( r q)e n l 3 ^ B< fi'^ J ^~ vl T\ 
Jdx'S{x')Jdn w{7 1 )^8'[x~A{x\r,J s )]^A{x\7 1 J s )Y n0[B(x '' nJa) ~ vll] 

Jdn w ( 7? ) e ™/3[B(0,r ) J s )-^r ) ] 

Jdn w(rj) tanhjp^e"^ ^-^ 
(X) Jdn w (rj) e nfl[B(o, v J.)-vT,] ^ 1W > 
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Figure 8. Continuous bifurcations from swollen (m = 0, HS) to collapsed 
(m ^ 0) states, for several n values around n = 2, for the case where J p > J g 
and v > (homogeneous polarity sequences, unlike proteins). The corresponding 
curve for n = oo is shown in figure We see that, if there were no discontinuous 
transitions, reentrance would occur upon lowering T for n > 2, where for n < 2 the 
continuous transition temperature is monotonic in J B /J P . This suggests strongly 
that there is a discontinuous bifurcation to HM phase for n > 2, but not for n < 2. 



This confirms that (| 109() indeed solves our bifurcation equation, with 



A = 



/ dn w{rj) ta,nh((3r]J s )e n ^ B< - ^-^ 



(111) 



Jdn w(r])e n l 3 i B ( ' r i J ^^^ 

This result allows us to compactify our bifurcation conditions further. Upon 
substituting (| 1 09[) into (|107p and carrying out the remaining integral, we obtain the 
following simple set of bifurcation conditions: 

2Xk 2 - 



where 



Am ^ 
Ak ^ : 

p(0 



l=(3J p d^ 2 p(0- 



l = n(3(J p -J g ) 



A-l 



n^(J p -J g )(k-k ) 



(112) 
(113) 

Jd^ w (^)e n f J i'( J p~ J 9)(k~k ) ( 114 ) 

For f3 = 0, infinite temperature, the right-hand sides of (| 1 1 2[) and (| 1 13[) are zero. 
Hence the physical transitions occur at the highest temperature for which the right- 
hand sides have increased to the value 1. If the first transition to take place is (I113|) . 
then m will remain zero and equation (|112[) will still apply to predict a further ra/0 
transition. If pi2[) is the first transition to occur, then (| 1 13|) will no longer apply. 



As a simple but nontrivial test we can recover from (|112I113[) our earlier 
predictions for the limit n — > oo. Taking n — > oo in (|llip gives the simple result 
lim^oo A = — sgn(v) tanh(/3J s ). In the HS, HC and HM phases we have J p > J g 
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and k = ±1, so lim„^ 00 p(^) = — k) and therefore lim„_ 1 . 0o J d£ £ 2 p(£) = 1. This 
simplifies condition (|112p for the continuous bifurcation of m in the k = ±1 phases 
to the expression found earlier in analyzing the n — > oo equations, as it should: 

/3J s = isgn(z/)log(/?J p ) (115) 

For the k = fc states the m^O bifurcation is discontinuous, involving a jump in the 
polarity statistics as measured by 7r(£); so there equations (|112lll3p do not apply. 

As an application of (|112I113|) we have solved these equations numerically for 
Jg/Jp = v = | and fc = 0, to investigate the effect of genetic noise on the phase 
diagram in figure [7J (although this is the biologically less relevant case of polymers 
with homogeneous polarity, it has the more interesting phase diagram). The result 
is shown in figure [5J Although based on equations that only apply to continuous 
transitions, the figure allows us to predict on topological grounds that discontinuous 
transitions will occur for n > 2. This is a remarkable result: the critical value n = 2 for 
the onset of first order transitions was found persistently in earlier coupled dynamics 
models [3TJ [32j [33l [34] , but since these did not involve short range forces, its re- 
appearance in the present model strongly suggests an unexpected universality which 
at present we do not understand. 

7. Numerical results 

7.1. Numerical solution of order parameter equations via population dynamics 

The goal of this section is to verify numerically the phases predicted in previous 
sections, and to provide phase diagrams for those cases where solutions of equations 
(|47I50I51I52[) for the observables m, k and ^(x) can not be found analytically. To limit 
the number of control parameters to be varied we choose J s = 0.1, J p = 1, [i = J p k* 
(so ko — k*), and k* = 0.7 throughout, since this still allows us to probe all the phases 
in figures [5] and [7J We followed the mathematically related studies [SSJ [37J [351 and 
solved the functional equation (|47p using a so-called population dynamics algorithm 
(with a population of size 10 ), which exploits the interpretation of such equations as 
fixed-point conditions for a suitably chosen stochastic process for the local fields. 

We turn first to the most important and realistic case of (near-)deterministic 
sequence selection, where for n — > oo we expect to recover the behaviour shown in 
the phase diagrams of figures [6] and [7] Here we face the practical problem that in 
our equations n appears usually in exponents, which limits our numerical analysis to 
values n < 400. It turns out that to observe the n — ► oo predictions one needs values of 
n that are significantly larger than this; furthermore, for large but finite n the limiting 
values of transition temperatures and the nature of the various transitions can vary 
significantly from one phase to another. In figure [9] we present numerical results for 
positive v, where the steric forces make it energetically favourable for adjacent amino- 
acids to have different side chain orientations. We plot the order parameter m versus 
temperature (left panel) to locate the IS— TC phase transition, which for n — > oo 
was predicted to be continuous, and which for the present control parameters should 
occur at T c = 1.183. It turns out that for large but finite n the transition is in fact 
discontinuous and at a lower temperature than the n — > oo one. However, a study of 
the asymptotic scaling with n of the transition temperature, within the numerically 
accessible regime, confirms that for n — > oo the correct value is found, see figure [5] 
(right). The observed strong dependence on n of the exact location of the transition is 
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Figure 9. Left: dependence of order parameter m on the folding temperature 
T, obtained by numerical solution of the order parameter equations, for control 
parameters (J s ,J p ,J g ) = (0.1,1,2), fco = 0.7, n = 0.2, v = 0.5. The relative 
genetic noise levels n = T/T were n = 100 (connected triangle), n = 200 
(connected circles) and n = 400 (connected squares). According to our earlier 
analysis, for n — > oo the phases should be those shown in figure[7] For the present 
values of control parameters this predicts for n — * oo a continuous transition from 
m (IS phase) to m = (IC phase) at lim n ^oo T c = 1.183 (shown as a vertical 
dashed line). Right: the IS— >IC transition temperatures T c shown versus 1/n, 
for the same values of the remaining control parameters. The data are perfectly 
consistent with the analytically determined value lim n _>oo T c = 1.183 (dashed). 

remarkable; the system appears to be very sensitive to the ratio of temperatures of the 
two coupled processes, and the deterministic regime is achieved only asymptotically. 
For n = 100 the location of the transition point differs by more than 10% from its 
tl — > oo value. If one carries out a scaling analysis of the magnitude of the jump in m 
found at the transition temperature for large but finite n, one finds that for n — > oo 
this jump will indeed vanish, in agreement with our previous asymptotic analysis. 

Upon carrying out a similar analysis for negative values of v, where steric forces 
are such that adjacent amino-acids prefer identical chain orientations, the resulting 
graphs and the physical picture are similar to those of v > 0. For large but finite n 
the phase transition is again discontinuous, and a scaling analysis shows once more 
good agreement with the theory in the limit n — > oo. However, there is an important 
difference between the cases v > and v < which concerns the sub-leading orders 
in n^ 1 for the state k — kg, as n — » oo, which is reflected in both the field distribution 
ty(x) and in the order parameter k close the transition. This is an important success 
of the population dynamics algorithm, which allows us to evaluate in a simple and 
straightforward way the distribution $f(x) of the short-range contributions to the local 
effective fields. In addition, it is a crucial test to verify the scaling of the sub-leading 
orders in predicted by our theory. Figure [T0l shows how m and k behave close 
to the transition. One notes that the (discontinuous) behaviour of m is qualitatively 
similar in both cases, whereas the polarity k behaves in a very different way: in 
contrast to v < 0, for v > there is a noticeable (small) jump in k at the transition. 
This can be explained if we assume that for v > the solution scales in a different way 
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Figure 10. Dependence of order parameter m (connected squares) and k 
(connected circles) on the folding temperature T, obtained by numerical solution 
of the order parameter equations, for control parameters ( J s , J p , J g ) = (0.1, 1, 2), 
ko = 0.2, and fj, = 0.7. In both graphs the relative genetic noise level is 
n = T/T = 200. Left graph: v = 0.5 (promoting different orientations for 
adjacent amino-acids). Right graph: v = —0.5 (promoting identical orientations). 
The large n theory of the previous section predicted that the sub-leading order in 
n for the k = fco solution (as shown here) is 0(ra~ J ) when v < 0, but 0(n -1 / 2 ) 
when v > 0. The numerical data shown here are consistent with this prediction. 





Figure 11. Distribution ty(x) of the short-range contributions to the local 
effective fields, for (J s ,Jp,J g ) = (0.1,1,2), n = 200, fco = 0.2, fj, = 0.7, and 
T = 1.07, as obtained via a population dynamics algorithm. Left: v = 0.5. 
Right: v = —0.5. Since for n — » oo the function ^(x) is symmetric, so these 
results confirm that finite-n effects are more profound for v > (where they are 
predicted to be 0(n~ 1,/2 )) than for v < (where they should be 0(n -1 )). 



with n 1 . Close inspection of the jump shows that this jump for v > is indeed of 
order 1 / y/n, again in perfect agreement with the theory. The difference between the 
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Figure 12. Phase diagram cross-sections for n = 1 (strongly noisy sequence 
selection), fi = 0.7, and J g /J p = 2. Top curve: v = — 1 (promoting identical 
orientations of adjacent amino-acids). Bottom curve: u = 1. (promoting opposite 
orientations). Solid line: transition marking the continuous bifurcation of m ^ 
states. Phases are defined and described in the main text. 

regimes v < and v > is also observed in the distribution ty(x) of the short-range 
contributions to the local effective fields; see Figure fTTI 

Although less relevant from a biological point of view, it is interesting to compare 
the phase diagrams of n — > oo (or at least large), describing (near-) deterministic 
sequence selection, to those one would have found for very noisy sequence selection. 
An example is shown in Figure 1121 for n = 1 . Compared to the phase diagram of 
Figure El we see that in the presence of high genetic noise the impact of the short 
range forces, as measured by J s is reduced drastically (note the different vertical 
scales), with as expected a corresponding reduction of sequence selection specificity. 

7.2. Numerical simulations 

The theory presented in this manuscript makes a large number of predictions about the 
cooperative long-time behaviour of the polymeric chain. In some asymptotic limits it 
is possible to work out the expressions for the relevant order-parameters of the system 
and find simplified algebraic equations which allow to plot phase diagrams. In other 
cases we had to rely on population dynamics algorithms to solve our functional order 
parameter equations and detect the relevant transition lines. 

In order to have independent tests of our formulae we have also performed 
Monte-Carlo simulations of the stochastic processes that would lead to equilibration 
with the Hamiltonians (fTJ) and ([5]). This is of course the cleanest way to check the 
theory. However, due to the special character of the coupled dynamics which requires 
nested equilibrations of two complex processes at widely separated time-scales, these 
simulations are highly nontrivial and extremely time consuming, and one is severely 
limited in both the number and the precision of simulation experiments that can be 
completed reliably. A systematic scan of all possible parameter regimes is certainly 
ruled out. Instead we focused on the regime n — > oo (genetic evolution of sequences 
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Figure 13. Results of numerical simulations of the coupled stochastic processes 
of (fast) folding and (slow) primary sequence selection, for N = 1000, at T = 0.3 
and n = 200. Further system parameters: v = J g = ^, J p = 1, and and 
k* = 0.7. For n — > oo one expects to find the phenomenology of Figure [7] with 
k = — 1 and with a region where m = (swollen) and m ^ (folded) states are 
simultaneously stable; for n = 200 one expects this to remain true but with shifted 
values of J B /J P . The left picture shows the equilibrated values of m, found upon 
increasing J s in stages from below (full circles) and alternatively upon decreasing 
J g in stages from above (open circles). It confirms that there is a coexistence 
region at the predicted range of values for J s /J p . The right picture, measured 
at Js/Jp = 0.25, shows the evolution in time of m (upper) and k (lower), upon 
initializing the system in the folded state that is stable for lower values of J B . It 
suggests that the chosen duration of 5.10 4 iterations per monomer suffices in the 
present parameter regime to achieve equilibration. 



at low noise levels). This is not only the most relevant one biologically, but is also 
the regime where our predictions take their most explicit form, as here we could go 
beyond population dynamics analyses. In particular, we chose to invest our computing 
resources in verifying the existence and location of the predicted coexistence region in 
the phase diagram shown in Figure [JJ 

We simulated the coupled Monte-Carlo dynamics associated with ([1]) and for 
n = 200, with v = J g = |, J. p = 1 (so Figure [7] is predicted to apply at least in 
the limit n — * oo) and k* = 0.7, for a system of N = 1000 monomers at folding 
temperature T — 0.3. We employed careful on-line tests to ensure equilibration of 
folding angles before carrying out monomer substitutions (i.e. genetic updates), and 
we allowed for 5.10 4 iterations per monomer. For these parameter choices our n — > oo 
theory predicts that always k = — 1, and that there are two critical values for J s : 
one should find m ^ (a folded state) for J s < 0.181, m = (a swollen state) for 
J s > 0.209, with coexistence of the m = and m ^ states for 0.181 < J s < 0.209. 
For large but finite n (here: n = 200) one should expect on the basis of the earlier 
data in e.g. Figure [9] to observe a shift of about 10% in these critical values relative 
to those of n — > oo. The results of the simulation experiments are shown in figure 
1131 Each individual point in the left figure represents the outcome of a simulation 
where both the fast and the slow process have equilibrated. The figures confirm the 
existence of a coexistence region, with critical J s values compatible with the predicted 
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10% shift relative to those calculated for n — > oo. The associated value of the order 
parameter k is indeed k = — 1. The graph showing the evolution in time of the scalar 
observables illustrates for J s /J p = 0.25 how the to ^ destabilizes if J s has become 
too large, and supports the claim that in the present parameter regime our simulations 
have equilibrated sufficiently. The remaining fluctuations in to are finite size effects, 
of the expected order Am ~ N~ 1 / 2 . 

8. Discussion 

In this paper we have studied the coupled stochastic dynamics of primary and 
secondary structure formation (i.e. slow genetic sequence selection and fast folding) in 
the context of a solvable microscopic model that includes both short-range steric forces 
and and long-range polarity-driven forces. The rationale behind our approach is that 
it allows us to circumvent the basic obstacle in the application of disordered systems 
techniques to protein folding, which is the need to specify in a mathematical formula 
the statistics of the disorder, i.e. the statistics of the amino-acid sequences. Here this is 
not necessary, the sequences are themselves allowed to evolve in time, albeit slowly (to 
model genetic selection) and in a manner that takes account of the folding properties 
of the associated chain, and the statistics of sequences are now an implicit output of the 
model rather than an input. Our solution is based on exploiting recent mathematical 
progress [36l [37] in the diagonalization of replicated transfer matrices, and leads in 
the thermodynamic limit to explicit predictions regarding phase transitions and phase 
diagrams at genetic equilibrium. 

In order to apply the methodology of replicated transfer matrices (which require 
a formulation in the form of a pseudo-one-dimensional system) we limited ourselves to 
effective Hamiltonians of a type that represents the physical feasibility and energetic 
gain of three-dimensional folds indirectly, as in e.g. [26]. Even then, in order to 
keep the remaining mathematics manageable, we chose to limit ourselves further by 
retaining only polarity forces and steric forces, we reduced the orientation degrees of 
freedom of individual monomers, and we made the simplest statistical assumptions 
regarding polarity and steric properties of amino-acids. However, in contrast to 
the above limitation to pseudo-one-dimcnsional models, these latter restrictions and 
choices are not strictly required and can in principle be lifted if one is willing to accept 
the inevitable associated quantitative increase in mathematical complexity. Even in 
its reduced form, our model and its solution still have a large number of control 
parameters to be varied, and a full exploration of its phase phenomenology would 
have required more than double the present page numbers. Instead we have largely 
focused on the regime which we believe to be the most relevant one biologically: the 
large n regime, where the genetic noise is low. We have tried to explain the phases 
observed and their transitions, and understand these qualitatively. 

Our model was found to exhibit a parameter regime where protein-like behaviour 
is observed, i.e. where the genetic selection results in inhomogeneous polarity 
sequences, and where the folding process describes transitions between swollen and 
collapsed phases. There was also a parameter regime where the genetic dynamics 
leads to polymers which are homogeneous in polarity. However, this un-biological 
behaviour requires unphysical values of the control parameters. There is a simple 
argument to see this. The reason for the energetic advantage of homogeneous polarity 
sequences is the mean-field contribution — (J p /N) J^. £(Ai)£(Aj) S < p u( f, j to (pj, which 
even for completely random angles {4>i\, where (S^^) — q^ 1 , retains on average a 
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value —J p N(N^ 1 £(Ai)) 2 . In random hereropolymer models with frozen sequences 
this term is irrelevant, but here the sequences {A^} evolve, so the system can reduce 
its energy by increasing |7V -1 J^i A rational alternative definition would be to 

replace the mean field term in (JTJ by —(J p /N) J2ij ^(^»)C(^i)[^<,^ — <7 _1 L expressing 
energy gain via folding in terms of correlation between side-chain orientations and 
polarity, rather than covariance. This would generate a term similar to the polarity 
balance energy, and result in the replacement J g v(k — k*) — > J g v(k — k*) + J p k 2 /q. 
For the simple choices q = 2, t>(x) = ^x 2 , and k* — 0, in particular, the change would 
translate into the simple parameter re-scaling J g — > J g + J p . The natural parameter 
regime is apparently J g > J p , the one with inhomogeneous polarities. 

There is certainly significant scope for improvement and expansion of this study. 
All our simplifying choices, made for the sake of mathematical convenience, should 
however be judged in the light of the complexity of the resulting equations even for 
the presently studied simplified model. The obvious directions to move into next are 
clear. First there is the search for more realistic Hamiltonians describing the fast 
process, by improving the energetic description of the effects of 3D folding (possibly 
via a formulation involving contact maps, which would replace the long range all- 
to-all forces by a sparse connectivity version), and by including hydrogen bonds. 
Second, we would like to work out our formulae for the case where the monomers' 
mechanical degrees of freedom consist of two angles, that furthermore can each take 
more than just two values (preferably a continuum, which would replace the replicated 
transfer matrices by replicated kernels). Thirdly, one would like to find more realistic 
alternatives for the sequence selection Hamiltonian, that is more precise in terms of 
quantifying a sequence's biological functionality, and that employ a better proxy for 
the unique foldability of a sequence than just its folding free energy. 

We see this paper as a proof of principle, demonstrating that it is in principle 
possible to construct solvable microscopic models of primary and secondary structure 
formation in hctcropolymcrs, with both long- and short-range forces, in which there 
is no need to assume (and average over) random amino-acid sequences or to find a 
formula for suitably non-random sequence statistics. This study represents a small 
step, but we believe it to be a step in a promising direction. 
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Appendix A. Identification of observables 

Appendix A.l. 'Slow' free energy as generator of observables 

In the stationary state, where both the fast degrees of freedom (0, giving the secondary 
structure) and the slow degrees of freedom (A, giving the primary structure) have 
equilibrated, expectation values of observables are given by two nested Boltzmann 
averages. Using definition ([5]) and (3 — n(3 the result can be written as 

tir^xw \ _E A e * (A) (G(0,A)) fast 

\{U{(p, AJ; f as t /slow — — 



£\ e -0H etl (A) 

^ ffCA jE^^(^A) 

£ e -/3ff f (0|A) 



E 

A 



f ,-P[u(X)+v(X)] , . 

A Zf (A) 



V V G(0\ A)e~^-[ Hf( ^° |A)+c/(A)+nA) ] 
A 1 ..^>" 



(A.l) 

with a — 1 . . .n. This latter expression is also obtained as the derivative of the 'slow' 
free energy /jy, provided we add a suitable generating term to the 'fast' Hamiltonian 
iff (0| A). To be precise, upon replacing 

fff(0|A)^ff f (0|A) + x iVG(0,A) (A.2) 

one obtains 

d 

((G(0, A)) fast ) s i ow = lim ^—f N (A.3) 

The validity of (|A.3[) , which allows us to use the free energy as a generating function 
for expectation values, follows immediately upon substituting (|A.2|) into ([6]): 

lim 9 l]m J_d y y e -^ E:=i [ xJVC? (0»A) +fff (0"|A) + a(A) + v(A)] 

x^o dx x^o nJV^ 5x V i 

A ...0" 

1 n Ea^cS 1 ,„ G(0 7 ,A)e"' 3 SLi[x G ^° A )+ Hf ^ Q l A )+^(A)+v(A)] 

— lim — N — 

^Oii^ EAE0 1 0" e _,3 S"=i[ xG( ^ A)+fft( ^ a|A)+t/(A+1/(A)) ] 

_ EaS^,..0" g(0\ A )e -^:^[ g ^°' A ) + ^ A ) + ^ A )] 

, n e -^^ =1 [J? f (0 o, |A)+c/(A)+y(A)] 

= e j8JV/*y^ G(0 1 ,A)e" /3 ^==i[ fff(<?! ' Q|A)+ ' 7(A)+l/(A) ] 
A 1 ...0" 

= ((G(0,A)) fas t)siow (A.4) 
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Appendix A. 2. Identification of order parameters for q = 2 

We next apply the general relations (|A.2IA.3[) for q = 2 to observables of the form 
G(er, A) = N^ 1 '^2 li g(ai,^i 1 rji). Here equations (|A.2IA.3[) translate into 

H t (a\X) -> H f (tr\X) + X^^^i) (A.5) 

i 

1 ^ d 

^2({9{°i, &, »7i)>6urt>siow = lim ^-/jv (A.6) 

We repeat our previous derivation of the free energy per amino-acid (|20[) but now 
with the new contribution x s(°ii £«j 7 ?i) included in the fast Hamiltonian Hf(a), 
in leading order in The new term changes (|14j) into 

M[ < r 4 _ 1 , < 7 i ,o- i+1 |m,k] = (( e «[^E«,(fc-+ m -"f 

x e _|3 ^ nJ 9"'(i E a *;a-fe*)+/3»?[./ s cr.+i-f ! _i-rii/]-/3xS a 9(o-?,£>'7)^ ^ 7^ 

From this we can immediately recover the identifications (|29l30p . For instance, 
choosing g(a, £, if) = <r£ gives 

X 

M[tri-i,(Ti,tTi + i\m,k] = M[tTi-i,tri,tr i+ i\ia - — (1, . .. , l),k] (A. 8) 

jp 

From this we extract, due to M[<Tj_i, <7j, Ci+i] only affecting the transfer matrix 
eigenvalue A(m, k), within the replica symmetric ansatz: 

fe^E^^siow = lim ±f N = -i lim ^tog\*L(jn-j-,k) 

1 d 

IogASL(m,*)|v=o = m (A.9) 



/3n J p 9m 

according to (|22|) . Similarly, making the alternative choice g(<y,^,rj) — £ gives in 
leading order in x' 

M[tr i -i,tr i ,(Ti + i\m,'k]= (A. 10) 

X 

M[(Ti-i,<Ti,(Ti +1 \in., k— — (1, . .. , l)]t)'(.)-Hi'(.)- X w"(.)/j p 



From this we extract, within the replica symmetric ansatz: 

1 9 

J im T7 y]((^) fast) slow = lim— /jy 
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i 



l-^«"(fc-fc*)]^logA«2 x (m,fc)| x= o= * (All) 



j3n J] 

according to (|23|) . The above identification of the scalar order parameters m and k 
was relatively easy since we could absorb the extra generating terms into those already 
present. This will generally not be the case. 
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Appendix A. 3. Joint distribution of primary structure variables 

We next turn to the calculation of the equilibrium amino-acid statistics as measured 
by 7r(£,f7) = liniAr^oo iV _1 X^((<K£ ~ ~ Vi)) fast) slow This distribution follows 

from (|A.5|A.6j) upon making the choice g(o~,£,rj) = 6(£ — £)S(t) — fj): 

H f (a\X) -> H t (a\X) + - |)£(rfc - ff) (A.12) 

i 

d 

K(£,,fi)= Jim lin i^-/jv (A.13) 

The calculation is now complicated by the fact that the convenient decomposition 
identity (|15[) n0 longer holds. Instead we now find, in replica-symmetric ansatz: 

M[ai-i,tTi, er i+ i|m, k] = M[o-j_i, CTi, er i+ i|m, k]| x=0 

+ 0( X 2 ) (A.14) 

so that 

JjM[o- i _i > o- i> o- i+ i|m,k]= JJr <Ti _ 1)<ri+1 (m,k) +0( X 2 ) 
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This leads us to 

ton ^log II W = logA«f ax (m,fc)| x=0 (A.18) 
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+ G(X 2 ) (A.19) 

and hence, with r(m, k) denoting the replica-symmetric version (12 ip of the transfer 
matrix r(m, k), with A^l^m, denoting the largest eigenvalue of T(m, k), and using 
the periodicity of the chain: 
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Tr{T N/2 {m,k)} 



(A.21) 



In the latter expression one must substitute for (m, &) the solution of the original x = 
saddle-point problem. We find once more a convenient effective decoupling of the odd 
sites from the even sites, as well as statistical independence of the single-site polarity 
and steric angle statistics, giving 7r(£, fj) = 7r(£)7r(7)) with the individual distributions 

tt(I) = (A.22) 

Eerer ^a^'a^ l ( m k)(5(£—£)e^^ Jp ( nk+m ^'<*' 7 t"^~ n,1 ~~ nJsV ' < - k ~ k *^) ^ e Py[J s tr-tr'-nv]^ 



lim 



Tr[T N ^(m, k)] 



<fl) = (A.23) 



lim 



N 



Tr[T N/2 (m,k)} 

The limit N —> oo can now be taken upon using the fact that for N — > oo one 
may write in leading order T cra i{m,k) — > A JV (m, ^<r" , where 

{u^-} and denote the left- and right-eigenvectors of r(m, fc) associated with 

the largest eigenvalue. In the result we can then substitute our expression (j4"Tj) for the 
largest eigenvalue and the replica-symmetric forms (I31I32P for the eigenvectors. For 
the polarity distribution 7r(£) this gives, after some further manipulations and with 
help of the definitions (|45I52|) : 



<0 



^§(£— ^ e P£[ J p( nk+m Y^ a cr T)~ n f J -- nJ g v '( k - k *)]^ ^ e /3r][J s (T-Cr'-nv]^ 



A(m, k)J2 



,L ,.R 



p(i) J dh cosh n (/3/i) / dx - x - |j p m)) 4 



M P(0 I dh cosh n (/3h) Jdx ^{h-x- £,J p m)^(x) 
r dh W(i, h) (A.24) 
For the steric angle distribution 77(77) one finds an expression with a similar structure: 

' U 0Up,(e l3 ^ Jp{nk+m ^<<*' Tc '' > ~ n ' J '~ nJgV '( k ~ k *^} / x 8(n-rf)e^ JsCT ' a ''~ n ' / ^ 

Jda;dx'$(xO*(^)((dl?y-^)e n ' 3[s(3: '' , '- 7s)+ « ( ' 7 '' fc -' 1 - J s l '' (fc - fc * )) -'' 1/1 cosh" [P(x+£ J p rn+A(x', n J s ))] 



J dx ^ x )^enBlB(m, v Js)+((Jpk-^,-J g v'(k-k*))-ur,]^ Jdxdx'$(x')y(x) cosh™ \J3(x + X')] 
Jdh cosh n ([3h) Jdx ^{x)^{h- A{x,i)J s )){8{n-f})e n0{B ^^ Ja) -^ u] ) ri 



Jdh cosh n (/3/i) Jdx $(x)$(fc - A(x,fjJ a )){e n n B ^'" J ^-^), 1 



(A.25) 
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Both in the limit n — > (fully random sequence selection) and in the limit [3 — > one 
sees both equilibrated distributions reducing to the prior statistics w(£) and w(f)), as it 
should. In general, however, one will find non-trivial distributions 7r(£) and Tr(fj), which 
reflect the complicated interplay between secondary and primary structure generation. 
Finally we observe that 7r(£) ^ except when J p m = 0; this suggests that, rather 
than the polarity distribution in the equilibrated system, the physical interpretation 
of p(£) is that of a prior distribution which would have been found in the absence of 
secondary structure formation. 



Appendix B. Saddle-point treatment of order parameter equations in the 
limit n — > oo 

Appendix B.l. Saddle-point treatment of the equation for ^!(x) 

Since we know that \&(x) — for |x| > J s , we may write without loss of generality 
\f!(x) = e nfi ^ for x G n C [- J s , J s ] and *(x) = for x <£ n, where J n dx e n WO) = 1. 
We also define for \x\ < y and y > the function 

C(x,y) = itanh _1 [tanh(/?x)/tanh(/3y)] (B.l) 

P 

It is the inverse of the function A(x, y) with respect to the variable x, since 
C(A(x, y),y) = x for all |x| < \y\. We note that C(0, y) = and sgn[C(x, y)] — sgn(x). 
We can now insert our expression for p(£) and the definition \&(x) = e™' 3 ^^^ (for x G SI, 
with \I/(x) = elsewhere) into our equation for ^(x), and use the function C(x,y) 
to subsequently transform variables inside the (5-distribution in the right-hand side. 
Since the Jacobian of this transformation will not be exponential in n as n — > oo, as 
a result of these manipulations we find for all x G fl an equation for ^(x) that is for 
n —>■ oo evaluated by steepest descent: 

lim ip(x) = lim — log < (B.2) 

J Q dyfd£dr) w(rj)w((,)5[C(x, r,J s )-y- j pTO ^] e "' 3 ^to)+?(./p-./ s )(fe-feo)+B(y+j p m^ I) j s )-^] 1 
/ n dy Jd^dn w(n)w(^)e n ^^ + ^^ J p~ J <i)( k - k o)+ B (y+ J p m i' 7 i-h)-vri] j 

i {^(y)+C(^ P -^s)(^- fc o) + S(y+ J p m^,7/J s )-^r/| 

max {-0(y)+f(Jp-^ g )(fc-fco) + S(j/ + Jpm^,?7J ; ,.)-iy?7} (B.3) 
yen, |£|<i, H<i i- J 

Solving the optimization problem (|B.3[) means calculating both the set fi C [—J s , J s ] 
and the function lim„^oo ip(x) for x G fl. Let us inspect some properties of this 
optimization problem in more detail. Since the maximization in the first line of (|B.3[) 
is over a subset of the set in the second line (instead of allowing for all y G O, in the 
first line we impose y = C(x,rjJ s ) — J p m£), it is inevitable that lining oo ^/>(x) < 
for all x G f2. We now know that V'max = linin— >oo max^gji ip(x) < 0. This leaves 
two options: ip max < versus ip max = 0. In the first case, however, we would get 
lim„^ co *(x) = lim n ^oo e n ^^ < lim,^^ e" w — = for all x G ft; this function 
can never be normalized. We conclude that V'max = 0. 



max 

yen, y=C(x,r 1 J B )-J pm i, |£|<1, |rj|< 
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Let us turn to those values of x for which one has lim,^^ ip(x) = ip max — 0. We 
call the set of those values f2* C ft: 



x e Q* : max < ip(y)+£(J P — J q )(k— ko)+B(y+ J v mLriJ s ) — vn 

yEtt, \£\,\r,\<l, x=A(y+J p mt,r,J„) 

f , \i){y)+Z{Jp-Jg){k-ko)+B(y+J p m£,r)J s )-v'q\ 

\7I\<1 i. J 



= max 
yen, \£\,\r 

(B.4) 

We see that with every combination (y, £, 77) that gives the maximum value in the 
second line there corresponds a value of x 6 fi*. If the maximum is obtained for a 
unique combination (y*,£*,Tj*), which apart from symmetries one must expect to be 
the generic case, then the set f2* contains just one element x* = A{y* + J p m£* , rj*J 8 ). 
It follows that one must generally anticipate limn^oo ty(x) to be a sum of a small 
number of <5-peaks. 

We can finally also use saddle-point arguments to express the limit n —> 00 of 
the free energy per monomer (|54| in terms of the function ip(x), the scalar order 
parameters (fc,m), and the set fl: 

lim ip=\j v (m 2 +k 2 ) - \j g {k 2 -k* 2 ) - \J p -J g \\k-k \ 

n — > 00 z Z 

max J ip(x) + £(Jp — J g )(k— fco) + -B(x+ J p m£, ryJ s ) — wq \ (B.5) 

In the remainder of this section we will not attempt to solve the problem (|B.3[) 
in its full generality, but rather construct two qualitatively different specific solutions 
of (|B.3|) . for which indeed ^(x) is found to reduce to either one or two 5-peaks, and 
which both reduce exactly to the unique solutions that we established earlier in the 
two limits J s — > or J p m — > 0. 

Appendix B.2. Homogeneous polarity states k = ±1 

Here we construct solutions of (|B.3[) where f2 = {a;*}, and show that these represent 
the continuation to arbitrary J s > and J p m ^ of the homogeneous polarity states 
k = ±1. Now we must have S7* = il and lim n ^ 00 tp(x*) = ^Wx = 0, and (|B.4|1 
becomes 

max L(£,n) = max L(£,vi) (B.6) 

J,J76[-1,1], E*=A(a:*+J p me,i7J.) £>*76[-l>l] 

= £{J P -Jg)(k-k ) + B(x*+J p m£,r)J s ) - vrj (B.7) 

In both sides of (|B.6p we maximize exactly the same object, but in the left-hand side we 
have the additional constraint that the values (£, rf) for which the maximum is found 
must allow the equation x* — A(x* + J p m£, r]J s ) to have a solution x* € [—Js, Js]- If 
the maximum in the (less constrained) right-hand side is obtained for an (£, rj) such 
that the equation x* = A(x* + J p m£,?]J s ) has no solution x* G [—J S ,J S ], then the 
extra constraint apparently interferes with the maximization and the two sides cannot 
be the same, so no solution with £7 = {x*} can exist. We conclude that the present 
type of solution exists if and only if both sides of (|B.6|) find their maximum at the 
same value (£, 77) (values that will depend on x*, since x* appears in the function to be 
maximized) , with the value of x* subsequently following from solution of the nonlinear 
equation x* — A(x* + Jpiri^f/Js): 

{€, V) = argmax^g^ 77) (B.8) 
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x * = A(x* + J p m£, f)J s ) (B.9) 

Since B(x,y) — B(\x\, \y\), and is monotonically increasing with both |x| and |y| we 
can immediately maximize with respect to r\ € [—1,1], giving f\ = — sgn(^). This 
simplifies our remaining problem to solving 

x* = -sgn(v)A(x* + J p m£, J s ) (B.10) 

£ =argmax fe[ _ 11] L(^) (B.ll) 

HO = i{J P -J 9 ){k-k ) + B(x*+J p m£, Js) (B.12) 

To resolve the remaining extremization we inspect the properties of B(x,y), in 
particular its second partial derivative in x. We find that the function is convex: 

= j2 TO 2 |l - 1 tanh 2 [/3(a;*+ J p m£ + J s )} 

- i tanh 2 J p m£ - J s )]| > (B.13) 

!/(£) can therefore only be maximal at the boundaries £ G { — 1,1}. Next we can 
rule out states with x* = 0, since substitution into (|B.10[) shows that it would be 
incompatible with £ = ±1. Due to J p m ^ 0, x* 0, and the monotonicity and 
symmetry of B(x,y), the function £(£) is not symmetric in £, hence its maximum is 
unique: 

£ = sgn{(J p - J g )(k-k ) + ^[B{x*+J pmi J,) - B{x*-J p m, J,)]} (B.14) 

Having solved the extremization problem for solutions with = {a;*}, resulting 
in the two coupled equations (|B. 10IB. 14|) we turn to the n — > oo limit of the order 
parameter equations (|81I82[) for m and k. We define 

R(£.) = i{J v - Jg){k-k ) + ~ logcosh[/3(J p m£+2 2 ;*)] (B.15) 

This is again a convex function, which is asymmetric in £ (due to x* ^ 0), and therefore 
takes is maximal value on the interval [—1,1] at the boundary 

£ = sgn{(J p - J g )(k-k ) + - log [ ^^.^ j } (B.16) 

Our equations for m and k can now be written as 

/dg tanh[/3( J p m£ + 2x*)]e n ^ R ^ 

m ~ ™ Jd£ w (£)e"' 3 «(«) 

= tanh[/3( Jpm + 2x*£)] (B.17) 

fc = n 1 ™ /de = * (R18) 

We have now confirmed that the present family of solutions with £1 = {x*} are indeed 
the generalization to arbitrary values of J s and J p m of the solutions k — ±1 with 
homogeneous polarity, as claimed. Putting all our final equations together, replacing 
£ by k e { — 1, 1} and using the full definition of B(x, y), gives the new set 

x* = - sgn(v)A(x* + J p m£, J s ) (B.19) 
m = tanh[/3( J p m + 2x*k)] (B.20) 
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k = sgn{(J p -J s )(fc-fc ) + A 
sgn{(J p - J g )(k-ko) 
:log 



log 



■cosh[/3(2x* + J p m)] 



cosh[/3(2x* 



J p m)] 



} (B.21) 



1 

4/3 



cosh[/3(x*+ J p m + J s )] cosh[/3(x*+ J p m— J s )] 



} (B.22) 



. cosh[/3(x*— J p m+ J s )] cosh[/3(x*— J p m — J s ^ 
In both of the limits J, — > and J p m — > we recover correctly the equations of the 
k = ±1 states as derived earlier for these special cases, viz. x* = 0, m = tanh(/3 J p m), 
and fc = £ = sgn[(J p - J ff )(fc-fc )]. 

Finally we try to compactify and simplify our equations. We first solve x* from 
(|B.20|) . which gives 

x* = fc[^tanh _1 (m) - ij p m] (B.23) 

Subsequent insertion into (|B.19[) leaves us with 



tanh[iarctanh(m) — ~/?J p m] 
tanh[iarctanh(m) — \f3J p m(\ — 2fc£)] 



= -sgn(i/)tan(/3J a ) (B.24) 



Furthermore, we notice that with k£ £ {— 1, 1} only the choice k = £ will allow the 
above equations to reduce to the equations for m — > that were found earlier, and 
that the alternative fc = — £ would make it extremely difficult to satisfy both (|B.21() 
and (|B.22j) simultaneously. Upon choosing £ = k and after additional rearranging and 
manipulation we can reduce our set of equations further to 



sgn(V) tan(/3 J s 



tanh[^/3 J p |m| — \ tanh 
tanh[i/?J p | 



i tanh 



(J p - J 5 )(l-fc fc)>^log 



2/3 



1 



coshftanh 



\m\] 

-2pJ p \m\] 



k k)> — log 



cosh [arct anh | m | ] 
cosh[arctanh|m| — 3/3J p |m|] +cosh(2/3J s ) 



(B.25) 

(B.26) 
(B.27) 



cosh [arctanh| m \ + [3 J p \m\] + cosh(2/3 J s 
The joint distribution W(h,£) of effective fields and polarities for the present 
solution is very simple: 

W(h, = S[h - kf3~ 1 tanh _1 (m)]*($ - fc) (B.28) 
Working out the free energy per monomer (|B.5|) for the above solution gives, using 
fco G ( — 1,1) and equation (|B.23|) to eliminate x*: 

lim (,5=ij p (m 2 +l) - ij s (l-fc* 2 ) - \J P - J g \(l-kk ) - (J p -J g )(l-kk Q ) 



B (_L t anh '(m) 
2p 



ij p TO, J s ) 



(B.29) 



Equation (IB.25|) gives a single transparent law from which to solve our order 
parameter m. Equations (|B.20IB.21[) give conditions for the solution of (|B.25[) to 
be acceptable; they are guaranteed to be satisfied for small m if J p > J g (due to 
| fco | < 1), whereas for larger m their validity needs to be checked explicitly. Equations 
(|B.20IB.21[) also suggest that, as was found explicitly in the simple cases J s = and 
J p m = 0, the most stable solution (and hence the thermodynamic state) will generally 
be the one with k = — sgn(fco). This completes our analysis of solutions with SI = {x*}. 
We always find k = ±1, viz. sequences with homogeneous polarity, provided J p > J g . 
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Appendix B.3. Inhomogeneous polarity states k = k$ 

In the same manner we now construct the continuation to arbitrary values of J s and 
J p m of the inhomogenous polarity states, where k — ko. For this case, where Jl no 
longer contains just one point, our equation (|B.3[) from which to solve linin^oo ^>{x) 
takes the following form 

ip(x) = max L{^,ri,y) — max L(^,ri,y) 

£,ne[-i,i], yen, y=c(x,r,j s )-j p m£ e^el-i.i], yen 

(B.30) 

HZ, V, V) = V-(y) + B(y + J p ™£, VJ.) - vr] (B.31) 

(provided x 6 fl). In contrast to the k ^ fco case, this equation has symmetries 

that can be exploited: it allows for will solutions with ip(—x) — ip(x) for all x G O, 

with SI symmetric around the origin. This is easily confirmed by working out the 

right-hand side of (|B.30|) under the assumption of this symmetry (via transformations 

like y — > — y and £ — > — £, which are allowed by the constraints) upon making the 

replacement x — > —x in the left-hand side and using L(— £,77, —y) = L(£,r),y) and 

C(-x,y) = C(x,y): 

ij}^c)-ij}(x)= max L(£,r],y) 

f,»je[-i,i], yen, y =c(-x,riJ s )-j p m£ 

— max L(^,r],y) 

£,?7e[-l,l], Vtn, y=C(x,r 1 J s )-J p m( s 

= max L(—^.r/,—y) 

£,ne[-i,i], yen, y=c{x,r]j s )-j p mi 

max L(€,T),y) = (B.32) 

f,»?e[-l,l], yen, y=C(x,riJ e )-J p m( 

We will now construct solutions for k — ko with this reflection symmetry. Inside 
(|B.30|) it allows us to transform without punishment y — > ysgn(i]) and £ — ► £sgn(ri), 
which gives a new expression that shows (using C(x,—y) = —C(x,y) and B(x,y) = 
B(\x\, \y\)) that both terms are maximized for sgn(Ty) = — sgn(z/), and the second term 
more specifically for the value r\ = — sgn(^). Upon abbreviating Q(x,£,r)) = {y e 
fi| y = C(x, rj,J s ) - J p m£}: 



max 

f,»76[-i,i], y en(x,z,n) 



L(Z,V>V)= , max \i/;(y)+B(y+J p m^,\r]\J s )-vr)} 

,^ n ma o, h ,s {V'(2/)+S(y+JpTO^, |7y|J s ) + |z/r?|| 
ICU»7l<i> yen(x,£,|7j|) 1. J 



and 



max 

f,?je[-i,i], yen 



L (£,V,y)= max {^(y) + B(y + J p m£, \rj\J s ) - ^77} 
C,i76[-i.i]i yen I J 

+ B{y+J p m£, J s )} + H 



= max 
|f|<i, yen 

We observe the potential consistency of assuming ip(x) to incease monotonically for 
x > 0. An increase in x leads via the constraint y £ Sl(a;,£, |r/|) to an increase of y 
inside the first maximization, so that ip(y) will increase. The term with B(.,.) will 
also increase if the sign of £ is chosen right. So we make the ansatz that ip(x) is 
differentiable, and that ip'(x) > on x > 0. This implies that SI = [— u, u], with 
max l£ n ip(x) — 4'( u ) = 0- The second maximization in (|B .30|) now reduces to 

^ {ip(y)+B(y+ J p m£, J s )} + = ^»(«) + B(u + J p |m|, J s ) + 

= S(t» + J p |m|, J s ) + \v\ (B.33) 



max 
yea, |e|<] 
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This simplifies our equation (|B.30|) for tp(x). For all x € [0, u] we now have 
$(x)= ^ max U(y)+B(C(x,\ V \J s ),\r ] \J s ) + M(\ V \-l)} 

\y\<u, y=C(x,\r]\J s )-J p m£, |£|,H<1 >- > 

- B(u + J p \m\, J s ) 

U(y)+B(C(x, |r?|J s ), \ v \J s ) + \u\(\ V \-l)\ 
\n\<i i. J 

- B(u + J p \m\, J s ) 



max 

\y\<u, \y-C(x,\ri\J s )\<J p \ m \, \r 



= nmx ,, ™* IIT , , . „{V'(l/)+-B(C(a: > MJ.).N^) + M(N-l)} 

M<1 ye[-u,u]n[C(x,\ri\J s )-J p \m\,C(x,\rj\J s ) + J p \m\] I ) 

-B(u+J p \m\,J s ) (B.34) 

Since ip(y) is monotonic in \y\, we need to be as large as possible for any given \rj\. 
Since the intersection interval (if it exists) is always biased to the right, we must find 
the largest allowed value y in the intersection interval. The intersection is seen to be 
empty if C(x, \ rj\ J s ) > u + J p \m\, whereas the remaining possible scenarios are 

u-J p \m\ < C(x, \i]\J s ) < u+J p \m\ : y max = u 

C{x, \n\J s ) < u-J p \m\ : y max = C(x, \i]\J s ) + J p \m\ 

Consistency with the premise x S [0, u] demands that we must identify the point 
where x becomes so large that the intersection interval is empty for any value of 1 77 1 
should be the boundary x = u. This, together with mini^|<i C(x, \n\J s ) — C(x, J s ), 
immediately gives us an equation for u: C(u, J s ) = u + or equivalently 

u = A(u + J p \m\, J s ) (B.35) 

Graphical inspection shows that this equation always has one unique non-negative 
solution u. Within our present construction we can always achieve a non-empty 
intersection set in (IB.34[) for suitable (£,77), and we may proceed with maximization 
over (77! . For each x £ Q we now have 

4>(x) = 

max f i>(C(x, \v\Js)+J P \m\)+B(C(x, \v\Js), \vW + W\M if C(x,\n\J 3 ) < u- J v \m\ 

|77|<i, o(a:,|77|Ja)<«+jp|m| | B(u + Jp \m\, \r)\ J B ) + \v\ \r)\ if C(x, \r)\J s ) > u— J p \m\ 

- B(u+J p \m\, J a ) - \u\ 



ip(C(x,z)+J p \m\)+B(C(x,z),z) + \v\z/J, if C(x, z) < u- J p \m\ 
*e[o,j,], G(x,z)<u+J p \m\ \ B{u + J p \m\, z) + \u\z/J s if C(x, z) > u—J p \m\ 

B(u+J p \m\, J s ) — H 



z e[C(x,^+j p \m\),J s ]\ B(u+ J p \m\,z) + \u\z/Js if z < C(x, u - J p \m\) 

- B(u+J p \m\, J s ) - \v\ (B.36) 

Since both C(x,z) and B(C(x, z), z) decrease monotonically with increasing z (see 
|Appcndix CP we are sure that for sufficiently small values of v we always find the 
maximum in (|B.36|) by substituting the smallest allowed value for z. We now proceed 
by assuming this property to hold for any value of v. If indeed we always need the 
smallest z, viz. z = C{x, u + J p \m\), we obtain for all x € [0, u]: 

4>(x) = B(u + J p \m\, C(x, u+J p \in\)) — B(u+ J p \m\, J s ) 

+ \vM*,"+Jp\™\) _ M (B . 37) 



tp(C(x,z) + Jp\m\) + B(C(x,z),z) + \v\z/Js if z > C(x, u - J p \m\) 
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This expression meets our requirements: it increases monotonically on [0, u], and 
(using the general identity C(x 1 C(x, y)) = y in combination with our previously 
established relation C(u,J s ) = u + J p \m\) one verifies that ifi(u) = 0. We take this 
as sufficient support for our ansatze; in addition we will find that for the purpose of 
evaluating the scalar order parameters (to, fc) and the phase diagrams we do not need 
the full shape of ip(x) but only the property that %jj(—u) = rp(u) = max^gsi ip{x) with 
it = A(u + J p \m\, Js). 

What remains in our present analysis is to work out the order parameter equations 
for to and fc, and confirm that these support the premise limn^oo k = ko. For large 
but finite n one would expect to have k = kg + k\jn + 0{n~ 2 ) for n — > oo, which 
implies that nj3^{J p — J g ){k — fco) = P£,{J P — J g )k\ + 0(rt _1 ). Similarly one would 
expect for large but finite n that log^x) = n(3ip(x) + ipi(x) + ©(n" 1 ), with ip(x) 
as given by (|B.37|) . Insertion of these forms into (|81l82p gives integrals over (x,y) 
that can be evaluated by steepest descent, with the relevant saddle-point obtained for 
x — y = wsgn(TO^): 



lim 

n — >oo 



fd£ w(£)£ J_ dxdy tanh[/3( J p m£ + x+y)]e 



n[log coah[p(J p mS+x+y)]+0ili(x)+l3^(, V )}+i> 1 (x)+ij> 1 (y)+l3((J p -Jg)ki 



fd£ U)(£) J" dxdy e n[lo S cosh[/3(J p rni+x+y)]+0i,(x)+l3i>(y))+i> 1 (x)+i> 1 (y}+gi(J p -Jg)k 1 

M u^)|£|sgn(m)tanh[/?(J p |m£| +2^]e nlogco ^ 

~ Jd£ lu (^) e «logcosh[/3(J p |mi|+2u)]+2Vi(uSgn(mO) + ^«(Jp-Js)fcl 

(B.38) 

so 

\m\ = tanh[/3(J p |m| + 2u)\ (B.39) 
Similarly we must solve 

fdf; w((;)(; f v dxdy e ™[ 1 °g cosh [/3(./p^+^+y)]+W(^)+/9V'(y)]+'i/'i(^)+'0i(y)+/3?(./p-^ 9 )fci 



ko = lim 



fd£ J" dxdy e n l l oScosh[/3(J p m^+x+y)]+0^(x)+l3^(v)]+iP 1 (x)+^ 1 (y)+^aJ P ~J g )ki 



[d£ w (^)^ e ™ lo gcosh[/3(J p |m5|+2t l )]+2^ 1 (t l Sgn(m5))+/3?(Jp-J s )fel 

= lim - 

n ^oo J^t w (^ e nlogcosh[/3(J p |m5|+2n)]+2^ 1 («Sgn(m5))+/35(J p - l 7 9 )fe 1 
e 2!/.i(uSgn(m))+/3(J"p-J s )fei _ e 2V>i(-uSgn(m))-/3(J p -J s )/!i 
~ e 2Vi(«Sgn(m))+/3(J p -J 9 )fei _|_ e 2^i(-uSgn(m))- / 3(J p -J s )fe 1 

As soon as a solution for the non-leading order ipi(x) exists, there will be a value 
of fci that give the desired value fc = fco. However, careful inspection of the sub- 
leading orders in the functional saddle-point equation for fy(x) reveals that the above 
construction works for v < 0, but no finite solution ipi (x) exists when v > 0. 
In the latter case it turns out that the solution of the problem scales with n as 
log^a;) = (3mp(x) + ^\{x)^n + O(n ) and fc = fco + kij^/n + . . .. For a detailed 
analysis of the different sub-leading orders see | Appendix D| 

The final result is that fc = fco solutions always exist (although they will be locally 
stable only for J g > J p ), and that the associate value of the order parameter to is to 
be solved from the two coupled equations 

\m\ = tanh[/3(2u+ J p |to|)] (B.41) 

tanh(/3tt) = tanh[/3(u + J p \m\) tanh(/3J s ) (B.42) 



(B.40) 
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The sign of m is arbitrary, both solutions m = ±|m| are allowed and equally likely. 

i/J-Hanh-^H) - | 



We solve the first equation for u, giving u = h/3 1 tanh 1 (|m|) — i J p |m|, and obtain 



an equation involving |m| only: 

■| m n - iR.UmW 

tanh(/3J s ) (B.43) 



tanh[i tanh 1 (\m\) — ^/3J p \m\ 



tanh[| tanh 1 (|m|) + ^(3J p \m\ 

The joint distribution W(h, £) of effective fields and polarities for the present solution, 
where J p m ^ 0, is found to be 

W(h,£) = -(l+k )S(£-l)6[h-J p m-2usgn(m)] 



+ ~(l-k )S(£+l)S[h+J p m+2usgn(m)] (B.44) 



1 

2 ( 

The free energy per monomer (|B.5|) for the present type of solution is found to reduce 
to 

- B^r 1 tanh-^lm]) + \j p \m\, J s ) (B.45) 

Appendix C. Properties of the functions C(x,y) and B(C(x,y),y) 

The functions B(x,y) and C(x,y) are defined as 

flfoy) = ^log^cosh^^+^cosh^^-y)]] (C.l) 
2/3 - 

C(x,y) = ^"Hanh'^tanh^/tanh^y)] (C.2) 

We are only interested in the regime where y > and \x\ < y. The function 
C(x,y) is monotonic and anti-symmetric in x, and obeys sgn[C(a;, y)] = sgn(xy) and 
|C(a;,y)| > \x\. It is the x-inverse of A(x,y), since 

A(C(a;,y),y) = /3 _1 tanh _1 [tanh(/3C(a;,?;))tanh(/3y)] 

= tanh -1 [tanh(/3a;)] = x 
C(A(x,y),y) = /T 1 tanh _1 [tanh(^A(a;, y))/ tanh(/3y)] 

= P^ 1 tanh -1 [tanh(/3x)] = x 
Furthermore C(x,y) obeys the general identity 

tanh(/?x) 
. t anh (/3a; ) / 1 anh (/?y ) . 
= /3 _1 tanh _1 [tanh(/3y)] = y (C.3) 

The function B(x,y) is symmetric in x; thus also the function B(C(x,y),y) is 
symmetric in x. The partial derivatives of C(x,y) are 

_9 = tanh(/3y) [1- tanh 2 (/3a;)] 

9x 1 ' yj tanh 2 (/3y) -tanh 2 (/3a;) 
9 tanh(/3x) [l-tanh 2 (/3y)] 

3y ~ tanh 2 (/3y) -tanh 2 (/3a;) 



C(x,C(x,y)) = /3 _1 tanh" 



^C(x, y) = - — ;V (C5) 
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Next we work out and simplify the quantity B(C(x,y),y) with the help of identities 
such as 



2cosh[tanh- 1 (m) + J3y] = e Py ( 

V 1 — m 



' \l-mJ 



2 cosh 



cosh 



tanh 1 



/tanh(/3a;) \ 



tanh(/3y) 



)-Pv 



tanh 2 (/3y) + tanh 2 (/3x) 
tanh 2 (/3y) - tanh 2 [fix] 



This results in 

B(C{x,y),y) = ^log<| -i <-o.]j ( tanh 1 



tanh(/?x) 



= 4 lo g[2cosh(^?/)] - ilogcosh(/3a; 

P P 



1 



log 



Hence we have 



d 



tanh(/3y) 

~ 2/3 

tanh(flx) [l-tanh 2 (/3y)] 



/3y ) cosh ( tanh" 



cosh(2/3y) 



_i r tanh(/3x) . 
^tanh(/3y) 



1 - 



tanh 2 (fix) 



tanh 2 ((3y) 



(C.6) 



—B(C(x,y),y) = 2 — - — -3— 
tanh {fly) - tanh (/3a; 



(C.7) 



Thus, in the region |x| < \y\ we have J^B{C(x,y),y) < for x < and 
J^B(C(x, y), y) > for x > 0. The function B{C{x, y), y) is symmetric in 2, diverges 
at x — ±y, and has a unique minimum B(C(0, y),y) = /3 _1 log[2 cosh(/3y)] at x — 0. 



Appendix D. Analysis of sub-leading orders for the state k = ko as n ^ 00 

Here we analyze in more detail the sub-leading terms in n of the nontrivial solution of 
our equations (|80|81|82p for the case where J g > J p , i.e. where m/0 and k = ko, as 
n — ► 00. Given the exponential scaling with n of the kernel in (|80|) . we may without 
loss of generality for n — >— > 00 always write ty(x) in one of the following two forms: 
either : 9(x) = e^OO+V-ito+otn- 1 ) (D.l) 
or: \&(x) — e n ^( x )+Vn-^i(^)+o{n a ) (D.2) 
Since ip{x) was found to be maximal at x = ±u (where u > 0), we find in both cases 
lim &{x) = at>(x - u) + (1 - aWx + it) (D.3) 

n — >oo 

where 



scaling (|D.1|1 
scaling (jD.2[> 



e </>i(«)^_ e ^i(-«) 

a = 9[ipi{u) - ipi{-u)] 

We will show below that for v < the solution is of the form (|D. 1|) . with k 
ko + kt/n + . . ., 

Vl + sgnM^o {^/T+Wl-V^^Wl) 



(D.4) 
(D.5) 



fci = 0, 



2|*dl 



(D.6) 



and with lim„_^ oc p(^) = u>(£), whereas for v > the solution is of the form (|D.2I) . 
with k — kg + k\ I ^pn + . . . , 

ipi(-^a)-ipi(u) 



fci = 

/3sgn(m) {Jp-JgY 
and with lim n _, 00 = <5[£+sgn(fci)]. 



a = 6[ipi(u)-'ipi(- J u)] 



(D.7) 
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Appendix D.l. First scaling ansatz: 0(n ) sub-leading terms 

If we simply substitute (ID.3[) and k = ko + k\/n + . . . into equation (|80[) . we find 

and 

aS(x - u) + (l-a)5(x + u) = (D.9) 

a Jd^d-q p(C)u;(?7)(5r2:-yl(J p me+u,?7J ;s )le"' 3 [ s ( J '' m5+u ' ,)Js )- iy ' ) ] 
hm r- 

n ~ > °° fd£dr) p(£)w(rj)< a e n ^ B( - J p m ^ + ^ r ' Js ^ u ^ + (l-a)e n ^ B( - J p m ^- %T i J ^- vr i^ I 

(1-a) fd^dr] p(OMv)^\x-A(J p mC-u,nJ s )]e n ^ B ^p m ^-^ J ^~^ 

+ lim p r- 

IWO ° fd£dr) p(O w (v)\ ae n ^ B ( J " m ^ + ^ r ' Js '>-' / ^ + (\-a)e n P[ B ( J p' mi i- u < , nJs)-i>rj\ L 

Since r\ G [— 1,1] and B(.,.) is symmetric and monotonically increasing in both 
arguments, the leading exponentials are maximal for r\ = — sgn(i/) and £ = ±sgn(m), 
so 

aS{x - u) + (l-a)8(x + u) = (D.10) 

ap(sgn(m))6[x-+sgn(h*)A(J p \m\-Hi, J s )] + {l-a)p(-sgn(m))S[xsgii(v)A(J p \m[Hi, J s )] 

ap(sgn(m)) + (1— a)p(— sgn(m)) 

There are two possibilities for solution, dependent on how we match the two (5-peaks 
on either side of this equation. One always ends up with u to be solved from 

u = A(J p \m\+u, J s ) (DAI) 

but, since u > 0, the specific matching depends on v. For v < one is forced to choose 

Q , e /3Sgn(m)(J p -,/ g )fc 1 



Q , e /3Sgn(m)(J p -J 3 )fe 1+ ( 1 _ a ) e -/3Sgn(m)(,/ p -,/ s )fc 1 

whereas for v > the only option is 

(l_ Q ,) e -/3Sgn(m)(J p -J s )fc 1 
a ~ Q , e /3Sgn(m)(J p -J 9 )fe 1+ (' 1 _ Q ,) e -/3Sgn(m)(J p -J ! ,)fe 1 

To proceed with equations (I81|82p for m and k we first calculate 



(D.12) 



(D.13) 



d£W(h,S)tf(h)= (D.14) 

fd£dxdy w(i)e K( - J p- J ^ kl ^{x)^{y)f(x + y + j p m^e nlo s cosh ^( x +y+ J p' m ^ 
J d^dxdy w(^)e f3 ^ ( - J p~ J !>) kll i'(x) 1 i'(y)e nlo s cosh ll 3 ( x +v+ J p' m O] 

aV(2^J P M)e 0Sgn(m)(Jp ~ J9)fc Ml^^ 



lim 



sgn(m) 



a 2 e /3Sgn(m)(./ p -./ B )fc 1 _|_ n_ Q ,)2 e - i asgn(m)(J p -J g )fc 1 

Application of this formula to f(h) — tanh(/3/i) and f(h) = 1 gives 

|m| = tanh[ ( 9(2u+ J p \m\)] (D.15) 

Q ,2 e/ 3Sgn(m)(J p -J g )fc 1 _ ( 1 _ a )2 e -/3Sgn(m)(J p -./ 9 )fc 1 

fc = sgn(m) a2e/ 5 S g n(m)( j p _j s)fel - ( - 1 _ Q ,^2 e -0sgn(m)(j p -j s )fe 1 ( D - 16 ) 

So far we have successfully recovered the equations for u and m are as derived earlier; 
the next question is whether we can find a corresponding solution for k\ and a. 
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Both (|D.12p and (|D.13[) are quadratic equations for a, so we expect at most two 
solutions. In fact for v > only one of these is in the interval [0, 1]: 

is<0: a£{0,l} (D.17) 
v > : a = — — rrr - (D.18) 

1 _|_ e /3Sgn(rn)(J p -J 9 )fci v ' 

For v < 0, combination with (|D.4ID.16|) subsequently gives 



, n yi^gn(m)fc (y/HM- y/HM) mi0 , 
fci=0, a = — (D.19) 

For v > 0, on the other hand, the solution breaks down. Upon writing k\ in terms of 
a and substituting the result into (|D.16|) . we find the trivial k$ = 0. Thus, only for 
the degenerate special case ko = is the solution of our equations for v > of the 
form (ID.lj) . We conclude that the generic solution for v > scales differently with n. 

Appendix D.2. Second scaling ansatz: 0[y/n) sub-leading terms 

If we substitute (|D.3j) and k — ko + ki/y/n + . . . into equation (|80p (where ki ^ 0, 
since otherwise we return to the previous scaling case) we get 

limp(£)=«[£ + sgn(Afi)] (D.20) 

n — >oo 

and 

aS{x - u) + (l-a)8(x + u) = (D.21) 

a fd£dr] p(£)w(r])5 \x-A(J p m&u, T) j s )] e nPlB<.Jprnt+«, v Js)-'>v] 
hm r 

(1-a) Jd£dn p(Ow(v)S[x-A(J p m^-u,n,J s )]e n ^ B ^ J '' mi -^ J ^-^ 



lim 



n-^oo J^jj p (Q w ^^ ae n0[B(J p mt+u,vJ.)-vn\ + {l- a ) e n0[B{J p mi-u,r 1 J B )-U7i\ j 

Once more the dominant exponent is maximal when r\ = — sgn(^) and £ = ±sgn(m), 
so 

aS(x — u) + (l—a)S(x + u) — 

eV ^[Vi(«)+/3sgn(m)(j p -j g )fc 1 ] 5\ x+sgn MA(J p \m\+u,J s )] 
lim — 

n^oo e v / ^[V'l(«)+/3Sgn(m)(J p -J 9 )fc 1 ] _|_ e v^[i/)i(— u)-^Sgn(m)(J p - J a )fci] 

e VS[*i(-u)-/wgn(T»)(Jp-Ji,)fci] ^[a;-sgn(iy)A(J p |m|+M, J,)] 

+ n^So e v / ^[V'i(«)+/3Sgn(m)(J p -J s )fci] _|_ e vMi/>i(-u)-/3Sgn(m)(J p - J g )fci] ^ ' ' 

Again we have to match the two <5-peaks on both sides. Since we know that the 
equation u = —A(J p \m\ + u, J s ) has no non-negative solutions u (for J p m ^ 0), we 
are forced to match S(x ± u) to 8\x ± t4( J p |m|+u, J s )] . From this we recover equation 
(|D.11[) . as required, but now with 

e vM^i(«)+/3Sgn(m)(J p -J s )fci] 

v < . a — lirn^ ev ^^ 1 ( u ) +i3S g n ( m ) ( j !) _j g)A . 1 ] ~ e vH[i/-i(-«)-/3sgn(m)(j p -7 3 )fci] 

(D.23) 

e Vn[i/'i(-tt)-/3Sgn(m)(Jp-J g )fei] 
> . a — lirn^ ev ^[^ 1 ( u)+|3S g I1 ( m )( i7!) _j g)fcl ] — e ys[ 1 /'i(- i 0-/3Sgn(m)(j p -j ! ,)fei] 

(D.24) 
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Our present equations can be obtained from those of the previous scaling regime upon 
substituting k% — > \fnk\ and ipi( x ) ~ ¥ \/nil>i(x). This allows us to take over the 
previous evaluation of the order parameter equations for m and k, provided we make 
the appropriate substitutions. For m we then recover equation (|D.11|) (as required), 
whereas the equation for k gives 

eV ^T[2V'i(«)+/3Sgn(m)(J p -J g )fc 1 ] _ e v^[2Vi(-u)-/8Sgn(m)(Jp-J s )fci] 

k sgn(m) - lirr^ e ^ 2 ^ 1 (u)+i3sgri( m )(j p -.j g )k 1 \ - e ys:[2^ 1 (-t l )-/3sgn(m)(j p - JJkT] 

(D.25) 

We have now successfully recovered the expressions for u and m derived earlier; the 
remaining question is whether we can find a corresponding solution for k\ and a from 
the coupled equations (|D.5|D.23ID.24ID.25|I . Since |fc | < 1 we conclude from (|D.25|1 
that the following must be true, so that the O(^fn) terms cancel and the O(n ) terms 
can indeed give us |fco| < 1' 

fa = / l( 7 U l"/ l( "A CD.26) 
/?sgn(ra)(J p - J g ) 

This solution for k\ we can insert into our previous equations for a, which gives 

v < : a = lim - ~FTT\ = 1 ~ a ( D - 27 ) 

v>Q: a = lim ^— - = a (D.28) 

Apparently, for v > the present scaling ansatz gives self-consistent solutions. For 
v < we find a = |, and hence fci = which is forbidden since it effectively brings us 
back to the previous scaling regime. We conclude that, apart from degenerate limits, 
the two scaling ansatze (jD.llD.2p are complementary: for v < the system is in a 
state of the type (|D.1|) . whereas for v > it is in a state of the type (|D.2|) . 



